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Foreword 



As part of its mission, the International Association for the 
Evaluation of Educational Achievement is committed to the 
development of the community of researchers who work in 
the area of assessment both nationally and internationally. The 
association also has a commitment to provide policymakers 
with the types of data and analyses that will further their 
understanding of student achievement and the antecedent 
factors that are implicated in student learning. 

As part of a larger strategy to achieve these broad goals, the 
IEA sponsors a research conference every two years as a 
means of providing opportunities for new researchers and 
more experienced scholars to meet, discuss, and present the 
findings of their work as it relates to the secondary analysis of 
IEA studies. The proceedings of the Second IEA International 
Research Conference, which was held in Washington DC, 
November 2006, and hosted by the Brookings Institution, are 
published here in two volumes. 



The papers in Volume 1 of the proceedings have as their central 
focus the Trends in Mathematics and Science Study (TIMSS). 
Volume 2 brings together papers that focus on the Progress 
in International Reading Literacy Study (PIRLS), the Second 
Information on Technology in Education Study (SITES), and 
the Civic Education Study (CivEd). 

IEA is grateful to everyone who participated in this conference 
and hopes that the papers provided here will interest those who 
work in the various areas of educational research represented 
in these pages. 

We look forward to future contributions to our conferences, 
and hope that these papers not only contribute to our 
understanding of educational achievement but also lead to 
the development of the community of researchers involved in 
international and national assessment. 




Hans Wagemaker PhD 

EXECUTIVE DIRECTOR, INTERNATIONAL ASSOCIATION FOR THE 
EVALUATION OF EDUCATIONAL ACHIEVEMENT 




Effects of science beliefs and instructional strategies on achievement of 
students in the United States and Korea: Results from the TIMSS 2003 
assessment 

J. Daniel House 
Northern Illinois University 
DeKalb, lllinios, USA 



Abstract 

Several instructional strategies have been designed to 
improve student achievement in science. In addition, 
longitudinal research shows that student beliefs are 
significant predictors of science achievement. The 
purpose of this study was to use data from the Trends 
in International Mathematics and Science Study 2003 
(TIMSS 2003) assessment to identify relationships 
between the science achievement of students in the United 
States and Korea, the classroom instructional strategies 
they experienced, and the beliefs they held about their 
learning in science. Because of the complex sampling 
design of the TIMSS 2003 assessment, jackknife variance 
estimation procedures using replicate weights were used 
to compute appropriate standard errors for each variable 
in this study. Multiple regression procedures were used 
to simultaneously assess the relative contribution of 
each science belief variable and instructional strategy 
toward the explanation of science test scores. There were 
several significant findings from this study. Frequent 



use of active learning strategies related positively to 
achievement test scores for students in both countries. 
Students who frequently engaged in cooperative learning 
activities (worked in small groups on an experiment or 
investigation) tended to earn higher science test scores. 
Students from both countries who indicated positive 
self-appraisals of their science ability (usually did well 
in science and learned things quickly in science) earned 
higher achievement test scores. Conversely, students 
who compared themselves negatively to other students 
(science was more difficult for them than for many of their 
classmates) tended to earn lower science test scores. The 
findings also identified significant relationships between 
several instructional strategies and science achievement, 
as well as between several science beliefs and science 
achievement. These results emphasize the importance of 
simultaneously considering instructional strategies and 
student beliefs when assessing factors related to science 
achievement. 



Introduction 

There is considerable interest in the design of effective 
instruction for teaching and learning in science. Rillero 
(2000) observed that early success in science can be 
facilitated through hands-on experiences that develop 
an interest in science. Student success in science further 
develops skills (such as classification, measurement, 
understanding variables, and analyzing data) that 
lead to success in many academic subjects. Innovative 
programs have been developed to provide elementary 
and secondary school students with hands-on 
opportunities to learn science concepts and laboratory 
techniques (Doyle, 1999). A program coordinated by 
the University of California (Los Angeles) enables high 
school students and teachers to use integrated science 
learning and technology activities to improve student 
problem-solving skills and science knowledge (Palacio- 
Cayetano, Kanowith-Klein, & Stevens, 1999). In a 
program conducted by the University of California 



(San Francisco), medical students visit Grade 6 
classrooms and provide instruction on topics related 
to health and biological sciences (Doyle, 1999). 

Two instructional approaches shown to be effective 
for improving student achievement in science are active 
learning strategies and cooperative learning activities. 
Recent findings indicate that the use of active learning 
materials results in improved science achievement and 
more positive attitudes toward science (McManus, 
Dunn, & Denig, 2003). Kovac (1999) similarly found 
that the use of active learning strategies for a general 
chemistry course led to improved student achievement, 
while Lunsford and Herzog (1997) found use of 
student-centered classroom instructional strategies 
for a life sciences course resulted in positive student 
responses and was effective for student performance 
on standardized exams. Finally, Maheady, Michielli- 
Pendl, Mallette, and Harper (2002) found the use of 
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active learning strategies for Grade 6 science associated 
with positive attitudes regarding learning gains and 
high levels of motivation to succeed in science. 

With respect to cooperative learning, results from 
cross-cultural research indicate that the use of co- 
operative learning groups for earth science results 
in higher achievement test scores and more positive 
attitudes toward science for high school students in 
Taiwan (Chang & Mao, 1999). Results from meta- 
analyses of the effects of cooperative learning on 
science outcomes indicates that a cooperative web- 
based learning environment designed to foster 
intrinsic motivation for learning science (Wang & 
Yang, 2002) is an effective strategy for improving 
student performance, facilitating more positive 
attitudes toward science, and promoting persistence 
into more advanced science and mathematics courses 
(Bowen, 2000; Springer, Stanne, & Donovan, 1999). 
This body of work shows that several instructional 
strategies have the simultaneous goals of improved 
learning outcomes and increased student motivation 
for learning science. 

A number of studies note that student beliefs are 
significantly associated with science achievement. 
Recent findings indicate that the motivational beliefs 
of high school students are significant predictors 
of science achievement test scores (Kupermintz & 
Roeser, 2002). Junior-high school students in Taiwan 
who expressed more positive attitudes toward science 
also tended to earn higher science test scores (Tuan, 
Chin, & Shieh, 2005). House (1996) found specific 
student beliefs (self-ratings of overall academic ability 
and drive to achieve, and expectations of graduating 
with honors) to be significant predictors of grade 
performance in introductory chemistry. In another 
study by House (2000a), academic self-concept and 
achievement expectancies significantly correlated with 
the grades earned by students in science, engineering, 
and mathematics disciplines. Singh, Granville, andDika 
(2002) found a strong relationship between students’ 
science attitudes and time spent on academic activities 
and science homework. DeBacker and Nelson (2000) 
observed that high school students who expressed high 
academic goals and held a high value for science were 
also more likely to show higher science achievement. 
Taken together, these results emphasize the importance 
of considering student beliefs when assessing factors 
related to science achievement outcomes. 



Results from international assessments indicate that 
students in Korea typically score above international 
averages (Martin et ah, 2000). Consequently, there 
has been a continuing interest in identifying factors 
related to science achievement for students in Korea. 
Commentators observe that education is an important 
part of Korean society and that parents play a critical 
role in emphasizing academic success (Ellinger & 
Beckham, 1997; Sorenson, 1994). In addition, several 
studies have identified classroom practices associated 
with science achievement in Korea. Results from an 
analysis of high school chemistry classrooms in Korea 
indicated that students need to understand theoretical 
models in order to facilitate reflective thinking and 
to incorporate new learning material (Cho, Park, & 
Choi, 2000). In a study by Lee and Fraser (2000), high 
school students in Korea reported the development of 
constructivist strategies in their high school classes 
through the use of cooperative learning activities and 
relevant materials. Korean high school students who 
expressed specific views about science tended to show 
lower self-efficacy toward science and tended to be 
passive learners (Park & Choi, 2000). Results from a 
case study of science instruction in Korea (Oh, 2005) 
indicated that teachers employed three primary roles 
during class sessions: presenting science knowledge 
to the students through various activities, coaching 
to enhance science achievement, and scaffolding. 
There is also evidence that the use of cooperative 
learning activities results in higher achievement 
levels and more positive attitudes toward science for 
middle school students in Korea (Chung & Son, 
2000). Consequently, there is evidence of significant 
relationships between effective teaching strategies for 
science and the achievement outcomes of students in 
Korea. 

Research studies drawing on data from the TIMSS 
assessments show significant relationships between 
science outcomes and various student characteristics 
and instructional strategies. With respect to science test 
scores, results from an analysis conducted by House 
(2000b) of data relating to TIMSS 1995 students from 
Hong Kong found significant relationships between 
certain classroom strategies and science achievement. 
Students who earned higher test scores were those 
who reported frequently doing experiments or 
practical investigations in class and working together 
in pairs or small groups. Similarly, students in Japan 
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who earned higher science test scores indicated that 
they frequently used active learning strategies during 
their science lessons and discussed practical or story 
problems related to everyday life when learning new 
science topics (House, 2002). Findings from the 
TIMSS 1999 assessment indicated that students in 
Japan, Hong Kong, and Chinese Taipei who earned 
the higher test scores were also those students who 
frequently used things from everyday life when 
solving science problems and did experiments or 
practical investigations in class (House, 2005). 
Elementary school students in Japan who frequently 
did experiments in class, worked together in pairs 
or small groups, and used computers during science 
lessons also tended to earn higher science test scores 
(House, 2006a). In addition, results from the TIMSS 
science performance assessment indicate that student 
performance on both procedural and higher-order 
thinking items contributes to achievement (Harmon, 
1999). 

Other studies have examined the importance 
of student beliefs for influencing achievement. 
Results for students in Australia revealed significant 
relationships between attitudes and aspirations and 
achievement outcomes (Webster & Fisher, 2000). 
House’s examinations of TIMSS data for students 
in Ireland and Hong Kong found that students who 
indicated they enjoyed learning science tended to 
earn higher test scores while students who attributed 
success in science at school to external factors (such as 
good luck) were more likely to earn lower science test 
scores (House, 2000c, 2003). Finally, an analysis of 
Grade 8 students from Cyprus indicated that teaching 
practices exerted a significant influence on attitudes 
toward science (Papanastasiou, 2002). 

The purpose of this study was to use data from the 
TIMSS 2003 assessment to simultaneously identify 
relationships between the science achievement of 
adolescent students in the United States and Korea, 
the classroom instructional strategies they experienced 
in relation to science, and their beliefs about their 
learning of this subject. Data relating to students from 
these countries were examined for two reasons. First, 
students from Korea have shown high levels of science 
achievement on previous international assessments 
(Martin et al„ 2000). Second, previous cross-cultural 
research has examined instructional strategies related 
to achievement for students from these countries, and 
this study provided opportunity to add to this body 
of work. 



Method 

The TIMSS 2003 assessment 

The TIMSS 2003 assessment examined target 
populations that were the two adjacent grades 
containing the largest proportions of nine-year- 
old and 13-year-old students. Student assessments 
were conducted during the spring of the 2002/2003 
school year. A matrix sampling procedure was used to 
compile test items into booklets because of the large 
number of science and mathematics test items on 
the assessment (Martin & Mullis, 2004). Eight test 
booklets were developed, and six blocks of items were 
included in each booklet (Smith Neidorf & Garden, 
2004). Representative samples of students took each 
part of the assessment. The intention of the TIMSS 
2003 assessment was to measure student performance 
on both mathematics and science at the Grade 4 and 
Grade 8 levels. 

Several procedures were used to select the schools 
within the Korean and the United States samples. For 
the sample of schools from Korea, initial stratifications 
were made according to province and urban status 
(large city, middle, rural). Further stratification was 
made by student gender in the schools (boys, girls, 
mixed). Remote schools, special education schools, 
and sports schools were excluded from the sampling. 
This procedure resulted in a total of 1 5 1 schools in the 
sample. However, 149 schools actually participated 
in the TIMSS 2003 assessment. With regard to the 
United States sample, stratifications were made by 
school type (public/private) and region. Schools at the 
Grade 8 level were also stratified by minority status 
(more than 15% minority students/less than 15% 
minority students). This procedure resulted in 301 
schools in the sample and 232 schools in the TIMSS 
2003 assessment. 

Students 

The students included in these analyses were from the 
TIMSS 2003 Population 2 samples (13-year-olds) 
from the United States and from Korea. Of these 
students, 8,093 from the United States and 5,076 from 
Korea completed all of the measures regarding self- 
belief variables and instructional practices examined 
in this study. 
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Measures 

As part of the TIMSS 2003 assessment, students were 
given a questionnaire that collected various data, 
including information regarding student beliefs about 
science and mathematics, classroom instructional 
activities, family characteristics, learning resources, 
out-of-school activities, and science achievement. 

This present study examined the effects of 
several classroom instructional activities on science 
achievement. Students indicated how frequently the 
following activities happened during their science 
lessons. “How often do you do these things in your science 
lessons ?” 

1 . We watch the teacher demonstrate an experiment 
or investigation 

2. We formulate hypotheses or predictions to be 
tested 

3. We design or plan an experiment or investigation 

4. We conduct an experiment or investigation 

5. We work in small groups on an experiment or 
investigation 

6. We write explanations about what was observed 
and why it happened 

7. We relate what we are learning in science to our 
daily lives 

8. We review our homework 

9. We listen to the teacher give a lecture-style 
presentation 

10. We work problems on our own 

1 1. We begin our homework in class. 

For these items, the original codings were 
transformed so that the following values were used to 
indicate the frequency of each activity: (1) never, (2) 
some lessons, (3) about half the lessons, (4) every or 
almost every lesson. 

Six specific measures were examined with respect 
to student beliefs about science: 

1 . I usually do well in science 

2. Science is more difficult for me than for many of 
my classmates 

3. I enjoy learning science 

4. Sometimes when I do not initially understand 
a new topic in science, I know that I will never 
really understand it 

5. Science is not one of my strengths 

6. I learn things quickly in science. 



For these items, original codings were transformed 
so that the following levels of agreement were indicated: 
(1) disagree a lot, (2) disagree a little, (3) agree a little, 
(4) agree a lot. 

The dependent measure examined in this study 
was each student’s science score on the TIMSS 
2003 assessment. Because students in the TIMSS 
2003 assessment were given relatively few test items 
in each specific content area, statistical procedures 
were developed to estimate student proficiency by 
generating plausible values for each student based on 
responses given (Gonzalez, Galia, & Li, 2004). Each 
plausible value provides an estimate of the performance 
of each student had they actually taken all possible 
items on the assessment. Five plausible score values 
were computed for each student because of error in 
the generation of these imputed proficiency values 
(Gonzalez et al., 2004). To provide consistency with 
the statistical procedures used for computing each 
national average score for mathematics achievement, 
the dependent measure used in this study was the 
average of the five plausible values generated for each 
student on the TIMSS 2003 science assessment. 

Procedure 

Statistical procedures applied to data collected using 
simple random sampling are inappropriate for data 
collected from assessments using complex sampling 
designs (Foy & Joncas, 2004) . One potential problem of 
using statistical procedures for simple random sampling 
on data collected from complex sampling designs is 
the possibility of underestimation of the error (Ross, 
1979). Underestimation of error can produce spurious 
findings of statistical significance in hypothesis testing 
(Wang & Fan, 1997). Consequently, it is critical when 
conducting appropriate statistical tests of significance 
that the design effect is considered and procedures are 
used that produce unbiased variance estimates. 

Because the TIMSS 2003 assessment employed a 
two-stage stratified cluster sample design, jackknife 
variance estimation procedures using replicate weights 
were used to compute appropriate standard errors for 
each variable included in this study. Brick, Morganstein, 
& Yalliant (2000) found that jackknife variance 
procedures are an effective method for providing 
full-sample estimates for data collected from cluster 
sample designs. This technique simulates repeated 
sampling of students from the initial sample according 
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to the specific sample design (Johnson & Rust, 1992). 
Sometimes referred to as a re-sampling plan, the 
technique produces estimates of the population means 
and the standard errors of those estimates (Welch, 
Huffman, & Lawrenz, 1998). An advantage of using 
the jackknife replication statistic is that it increases the 
generalizability of research findings because it provides 
population estimates rather than findings from a single 
sample (Ang, 1998). 

For this study, multiple regression procedures were 
used to simultaneously assess the relative contribution 
of each self-belief variable and classroom instructional 
strategy in explaining the science test scores. In each 
instance, analyses were conducted separately for the 
entire sample of students from each country. 

Results 

Table 1 presents a summary of the results from the 
multiple regression analysis of relationships between 
science beliefs, classroom instructional strategies, and 
achievement test scores for students in Korea. 

Five science belief variables significantly entered the 
multiple regression equation. Students who earned the 
higher test scores tended to indicate that they learned 
things quickly in science and usually did well in 
science. Interestingly, students who expressed negative 
self-appraisals of their ability to learn new science 
topics (“Sometimes when I do not initially understand 
a new topic in science, I know that I will never 
really understand it”) tended to earn higher science 
achievement test scores. Conversely, students who 
earned the lower science test scores tended to report 
that science was not one of their strengths. Similarly, 
students who earned lower test scores were the students 
most likely to express negative comparisons of their 
science ability relative to the ability of other students 
(“Science is more difficult for me than for many of my 
classmates”). 

With respect to instructional strategies, students 
who earned the higher test scores were those most 
likely to report that they frequently worked problems 
on their own and that they related what they were 
learning in science to their daily lives. Students who 
said they frequently listened to the teacher give a 
lecture -style presentation also tended to earn higher 
science test scores. Frequent use of cooperative 
learning strategies (“We work in small groups on an 
experiment or investigation”) was positively associated 



with science test scores. Three instructional strategies 
showed significant negative relationships with science 
test scores. Students who earned lower test scores 
reported that they frequently conducted an experiment 
or investigation and watched the teacher demonstrate 
an experiment or investigation. Students who 
reported higher amounts of class time spent reviewing 
homework also tended to earn the lower test scores. 

The overall multiple regression equation that 
assessed the joint significance of the complete set of 
science belief variables and classroom instructional 
strategies was significant (F( 17,59) = 70.53, p < 
.001) and explained 31.0% of the variance in science 
achievement test scores for adolescent students in 
Korea. 

Findings from the multiple regression analysis 
of relationships between science beliefs, classroom 
instructional strategies, and science achievement test 
scores for students in the United States are presented 
in Table 2. Five science belief variables and seven 
instructional strategies significantly entered the 
multiple regression equation. Students who earned 
the higher test scores tended to indicate they learned 
things quickly in science and usually did well in science. 
Conversely, students who earned lower science test 
scores were more likely to report that science was not 
one of their strengths. However, students who reported 
that they enjoyed learning science actually earned 
lower test scores. In addition, students who earned 
lower test scores expressed lower self- appraisals of their 
ability to learn new science topics (“Sometimes when 
I do not initially understand a new topic in science, I 
know that I will never really understand it”). 

With respect to classroom instructional strategies, 
students who earned higher test scores reported that 
they more frequently conducted an experiment or 
investigation in class and worked problems on their 
own. Students who earned higher test scores also 
reported that they frequently engaged in cooperative 
learning activities (in small groups on an experiment 
or investigation). Four classroom instructional 
strategies showed significant negative relationships 
with science test scores. For instance, those students 
who more frequently watched the teacher demonstrate 
an experiment or investigation and who formulated 
hypotheses or predictions to be tested earned lower test 
scores. Similarly, students who earned lower test scores 
reported that they frequently designed or planned an 
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Table 1: Relationships between Science Beliefs, Classroom Instructional Strategies, and Science Achievement Test 
Scores (Korea) 



Self-belief/lnstructional activity 


Parameter estimate 


Standard errors 
of estimate 


Z-score 


Science beliefs 








1 usually do well in science 


27.197 


2.131 


12.76** 


Science is more difficult for me than for many of my 


-5.576 


1.509 


-3.69** 


classmates 








1 enjoy learning science 


-1.446 


1.429 


-1.01 


Sometimes when 1 do not initially understand a new 


4.523 


1.355 


3.34** 


topic in science, 1 know that 1 will never understand it 








Science is not one of my strengths 


-4.792 


1.617 


-2.96** 


1 learn things quickly in science 


5.961 


1.638 


3.64** 


Instructional strategies 








We watch the teacher demonstrate an experiment or 


-7.525 


1.198 


-6.28** 


investigation 








We formulate hypotheses or predictions to be tested 


0.779 


1.624 


0.48 


We design or plan an experiment or investigation 


0.039 


1.811 


0.02 


We conduct an experiment or investigation 


-5.178 


1.866 


-2.78** 


We work in small groups on an experiment or 


7.921 


1.313 


6.03** 


investigation 








We write explanations about what was observed and 


1.973 


1.223 


1.61 


why it happened 








We relate what we are learning in science to our 


4.603 


1.248 


3.69** 


daily lives 








We review our homework 


-5.436 


1.261 


-4.31** 


We listen to the teacher give a lecture-style 


8.526 


1.385 


6.16** 


presentation 








We work problems on our own 


15.373 


1.199 


12.82** 


We begin our homework in class 


-1 .730 


1.408 


-1.23 



Note: **p < . 01 . 



experiment or investigation and related what they were 
learning in science to their daily lives. 

The overall multiple regression equation that 
assessed the joint significance of the complete set of 
science beliefs variables and classroom instructional 
strategies was significant (F( 17,59) = 30.32, p < 
.001) and explained 14.6% of the variance in science 
achievement test scores for adolescent students in the 
United States. 



Discussion 

Several significant findings emerged from this study. 
A number of specific science beliefs were significantly 
associated with science achievement test scores for 
students in the United States and Korea. Students 
from both countries who expressed positive beliefs 
about their science abilities (usually did well in 
science and learned things quickly in science) also 
tended to earn higher science test scores. Students 
from both countries who held negative appraisals of 
their science ability (considered science was not one 
of their strengths) were more likely to earn lower 
achievement test scores. Differences between students 
in the United States and Korea were also noted for 
relationships between science beliefs and test scores. 
Students in the United States who indicated they 
enjoyed learning science actually earned lower test 
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Table 2: Relationships between Science Beliefs, Classroom Instructional Strategies, and Science Achievement Test Scores 
(United States) 



Self-belief/lnstructional activity 


Parameter estimate 


Standard errors 


Z-score 






of estimate 




Science beliefs 
1 usually do well in science 


13.075 


1.920 


6.81** 


Science is more difficult for me than for many of my 


-2.188 


1.381 


-1.58 


classmates 








1 enjoy learning science 


-4.280 


1.544 


-2.77* 


Sometimes when 1 do not initially understand a new 


-11.008 


1.520 


-7.24** 


topic in science, 1 know that 1 will never understand it 
Science is not one of my strengths 


-5.595 


1.387 


-4.04** 


1 learn things quickly in science 


8.167 


1.908 


4.28** 


Instructional strategies 

We watch the teacher demonstrate an experiment or 


-4.076 


1.719 


-2.37* 


investigation 

We formulate hypotheses or predictions to be tested 


-4.362 


1.791 


-2.44* 


We design or plan an experiment or investigation 


-12.330 


1.633 


-7.55** 


We conduct an experiment or investigation 


14.379 


1.948 


7.38** 


We work in small groups on an experiment or 


6.960 


1.995 


3.49** 


investigation 

We write explanations about what was observed and 


-1.366 


1.514 


-0.90 


why it happened 

We relate what we are learning in science to our 


-4.607 


1.373 


-3.36** 


daily lives 








We review our homework 


-0.025 


1.478 


-0.02 


We listen to the teacher give a lecture-style 


1.826 


1.500 


1.22 


presentation 









Note: "p < .01 ; * p < .05. 



scores, but this relationship was not significant for 
students in Korea. Students in the United States who 
expressed negative appraisals of their ability to learn 
new science information (“Sometimes when I do not 
initially understand a new topic in science, I know that 
I will never really understand it”) tended to earn lower 
science test scores; the same relationship was positive 
for students in Korea. However, the results of this 
study indicate that student beliefs related significantly 
to science test scores and need to be considered when 
assessing factors associated with science achievement. 

With respect to classroom instructional strategies, 
a number of specific activities related significantly to 
science achievement for students in the United States 
and Korea. More frequent use of cooperative learning 
activities (students working in small groups on an 
experiment or investigation) was positively associated 
with science achievement test scores for students from 



both countries. Similarly, active learning strategies 
(students working problems on their own) were 
positively related to science achievement for students 
from both countries. Conversely, students from both 
countries who reported that they frequently were 
passive learners during their science lessons (they 
watched the teacher demonstrate an experiment or 
investigation) tended to earn lower science test scores. 

Several differences between students in the 
United States and Korea were also found in terms 
of relationships between instructional strategies and 
science achievement. For instance, students from the 
United States who earned lower science test scores 
reported that they frequently formulated hypotheses 
or predictions to be tested and designed or planned an 
experiment or investigation; these relationships were 
not significant for students from Korea. Students from 
the United States who reported frequently conducting 



7 



IEA CONFERENCE PROCEEDINGS 2006 



experiments or investigations during science lessons 
also tended to earn higher test scores; the same 
relationship was negative for students from Korea. 
Students from Korea who reported frequently making 
real-world connections to their science material (they 
related what they were learning in science to their daily 
lives) also tended to earn higher science test scores. 
However, this relationship was negative for students 
in the United States. These findings indicate that the 
assessment of classroom instructional strategies is 
critical for understanding student outcomes. These 
results also suggest cross-cultural similarities and 
differences in the relationship between instructional 
activities and science achievement. 

Several of the findings from this study are consistent 
with the results of previous research results. One 
important group of findings in the study concerned the 
significant relationships between a number of student 
self-beliefs and science achievement — relationships 
that held even after the effects of classroom instruction 
had been taken into account. For instance, students 
who expressed positive beliefs about learning science 
also tended to earn higher science test scores. House 
(2006b) similarly found, even after considering the 
effects of several types of instructional strategies, 
significant relationships between several mathematics 
beliefs of adolescent students in Japan and their algebra 
achievement. Results from a developmental analysis of 
the relationship between academic self-concept and 
achievement showed a significant causal connection 
between the self-beliefs of elementary school students 
and teacher ratings of performance (Guay, Marsh, & 
Biovin, 2003). Earlier work by House (1994) and by 
Yallerand, Fortier, and Guay (1997) showed beliefs 
to be significant predictors of withdrawal from high 
school and of grade performance in science courses. 
Beliefs are thus an important factor in determining 
student achievement outcomes. 
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Abstract 

Algebra knowledge is a critical part of middle school 
mathematics achievement. Success in algebra is necessary 
for taking higher-level mathematics courses and leads to 
higher scores on standardized tests. The purpose of this 
study was to use data from the Trends in International 
Mathematics and Science Study 2003 (TIMSS 2003) 
assessment to identify relationships between the algebra 
achievement of adolescent students in the United States 
and Japan, the beliefs these students held about their 
learning in this subject, and the classroom instructional 
strategies they experienced in relation to it. Jackknife 
variance estimation procedures using replicate weights 
were used to compute appropriate standard errors for each 
variable in this study. Multiple regression procedures were 
used to simultaneously examine the relative contribution 
of each instructional activity and mathematics belief 
variable toward explaining the explanation of algebra 



achievement test scores. Students from both the United 
States and Japan who earned higher algebra test scores 
were more likely to indicate positive beliefs about their 
mathematical ability (they learned things quickly in 
mathematics and usually did well in mathematics). 
Students who earned lower algebra test scores compared 
themselves negatively to other students. With instructional 
practices, those students from both countries who had 
frequent opportunity to work problems on their own 
tended to earn higher algebra test scores. The study 
also found the mathematics beliefs of the United States 
and Japanese students and the classroom instructional 
practices they experienced to be significantly related to 
algebra achievement. Cross-cultural similarities and 
differences were also noted for these relationships. These 
results have implications for mathematics instruction and 
identify strategies to improve algebra achievement. 



Introduction 

There is increasing interest in identifying factors 
associated with mathematics achievement. Many 
career options are open only to students who have 
mastered mathematical skills and enrolled in advanced 
mathematics courses (House, 1993). Algebra knowledge 
is a critical part of middle school mathematics 
achievement. Student success in algebra is necessary 
for taking higher-level mathematics courses and leads 
to higher scores on standardized tests (Catsambis, 
1994; Telese, 2000). Instructional strategies such as 
the use of appropriate problem-solving activities have 
been used to foster student achievement in algebra. 
An algebra curriculum centered on problem-solving 
and incorporating real-world applications provides 
students with opportunities to succeed in algebra. 
Farrell and Farmer ( 1 998) have identified eight modes 



of instruction that can be applied to algebra teaching: 
lecture, question/answer, discussion, demonstration, 
laboratory, individual student projects, supervised 
practice, and technological activities. Research findings 
indicate that instruction in introductory algebra should 
include activities that incorporate the use of informal 
knowledge, application to real-world settings, and 
applications of mathematical thinking (Telese, 2000). 
These studies highlight the importance of examining 
student and instructional factors related to algebra 
achievement in order to improve opportunities for 
success in mathematics. 

Several studies have examined the relationship 
between student beliefs and academic achievement. 
A longitudinal study by House (1997), for example, 
found the initial self-beliefs of a sample of Asian- 
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American students to be significant predictors of their 
grade performance. Marsh and Teung (1997) found 
that the academic self-concept of adolescent students 
exerts significant causal effects on their mathematics 
achievement. Similarly, earlier studies by House (1993, 
1995) with a group of older adolescent students found 
their academic self-concept to be significantly related 
to higher mathematics course grades. Another study 
by House (2001a), this time with American Indian/ 
Alaska Native students, found significant correlations 
between specific facets of these students’ academic self- 
concept and achievement expectancies (self-ratings 
of overall academic ability and mathematical ability, 
and expectations of making at least a B average in 
college) and their mathematics achievement. Results 
from a longitudinal study of middle school students 
in Germany indicated that students who expressed 
higher initial levels of interest in mathematics were 
those students most likely to subsequently enroll in 
advanced mathematics courses (Roller, Baumert, & 
Schnabel, 2001). Significant relationships have also 
been found between the mathematics self-efficacy and 
the subsequent mathematics achievement of middle 
school students (Pajeres & Graham, 1999). Research 
findings for middle school students also indicate that 
students who express more focused learning goals tend 
to have a higher mathematics self-concept (Anderman 
& Young, 1993). 

Examinations of cross-cultural differences in the 
relationship between student beliefs and mathematics 
achievement indicate the importance of considering 
student beliefs when assessing factors that influence 
mathematics achievement. A recent study by Ercikan, 
McCreith, and Lapointe (2005) found that self- 
confidence in mathematics was the factor most 
strongly associated with mathematics achievement for 
students in Norway and Canada, but not for students 
in the United States. Another study, byTsao (2004), 
found Grade 5 students in Taiwan tended to have more 
positive beliefs about mathematics than did students 
in the United States. 

Considerable interest has been directed toward the 
mathematics achievement of students in Japan, and 
research has examined instructional strategies used 
for mathematics teaching and learning. For instance, 
the Learner’s Perspective Study (LPS), involving an 
analysis of mathematics classrooms in nine countries, 
found that students in Japan discuss their strategies 
for solving problems set during the lesson and make 



presentations to the rest of the class (Shimizu, 2002). 
Sawada (1999) notes that mathematics instruction 
in Japan focuses on the development of problem- 
solving strategies, with entire class sessions oriented 
toward a single problem. An observational analysis of 
classrooms conducted by Stigler, Lee, and Stevenson 
(1987) found a significantly high proportion of class 
time spent on mathematics instruction in classrooms 
in Japan, and more time spent on other activities such 
as classroom management in United States classrooms. 
Becker, Silver, Kantowski, Travers, and Wilson (1990) 
note that teachers in Japanese classrooms tend to 
present multiple strategies for solving mathematics 
problems, while Kroll and Yabe (1987) describe 
teaching strategies used in Japan that incorporate 
manipulative materials designed to help students 
develop flexible thinking about methods for solving 
mathematics problems. These strategies have led to 
observations that elementary school students in Japan 
explain solutions to problems in ways that incorporate 
more complex mathematical concepts (Silver, Leung, 
& Cai, 1995). Finally, according to Perry (2000), 
teachers in Japanese classrooms provide more extended 
explanations to their students. The results of these 
studies highlight cultural differences in mathematics 
classroom practices and problem-solving strategies. 

Several studies have successfully used data from 
the TIMSS assessments to identify student and 
instructional factors associated with the mathematics 
outcomes of students in the United States and Japan. 
For example, results from the TIMSS Videotape 
Classroom Study indicate that students in Japan 
spend a considerable amount of time during 
mathematics lessons developing solutions to problems 
and examining a single problem (Stigler, Gallimore, 
& Hiebert, 2000). Japanese students also are likely 
during mathematics lessons to present alternative 
strategies for solving mathematics problems and to 
cover advanced mathematical content (Shimizu, 1999; 
Stigler, Gonzales, Kawanaka, Knoll, & Serrano, 1999). 
A case study of a geometry lesson in Japan as part of 
the TIMSS 1995 Videotape Classroom Study found 
incorporation during computer-based mathematics 
teaching of instructional activities enhanced student 
attention and interest (House, 2002). 

Findings from the TIMSS 1995 assessment found 
significant associations between specific instructional 
activities and the mathematics achievement of 
students in Japan. Those students who earned the 
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higher test scores were also those students frequently 
assigned homework, who used things from everyday 
life when solving mathematics problems, and who 
tried to solve problems related to new mathematics 
topics when learning new material (House, 2001b). 
The TIMSS 1999 assessment also found that the 
students in Japan who earned the higher mathematics 
test scores were those who tended to frequently receive 
homework (House, 2004). Conversely, those students 
who frequently spent time during mathematics lessons 
checking one another’s homework and/or having 
the teacher check homework tended to earn lower 
test scores. Telese (2004) found that students in the 
United States who frequently used calculators during 
mathematics lessons also showed higher algebra test 
scores. 

Research from the TIMSS assessments also 
highlights instructional practices associated with 
interest in learning mathematics for students in Japan. 
For example, students who expressed enjoyment when 
learning mathematics were the students most likely to 
report discussing practical or story problems related to 
everyday life, working on mathematics projects, and 
engaging in cooperative learning (working together 
in pairs or small groups on problems or projects) 
during their mathematics lessons (House, 2003, 
2005). Students in Japan who attributed success 
in mathematics to controllable factors (hard work 
studying at home and memorizing the textbook or 
notes) tended to gain higher test scores while the 
students who attributed success to external factors 
(good luck) tended to show lower achievement levels 
(House, 2006a). 

The purpose of this present study was to use data 
from the TIMSS 2003 assessment to simultaneously 
identify relationships between the algebra achievement 
of adolescent students in the United States and Japan, 
their beliefs about their learning of algebra, and the 
classroom instructional strategies they experienced 
in relation to this subject. Data relating to students 
from these two countries were examined for two 
reasons. First, students from Japan have scored above 
international averages on previous mathematics 
assessments (Kelly, Mullis, & Martin, 2000). Second, 
previous cross-cultural studies have examined factors 
associated with mathematics achievement for students 
in these two countries, and this study provided 
opportunity to add to this body of work. 



Method 

The TIMSS 2003 assessment 

The TIMSS 2003 assessment examined target 
populations that were the two adjacent grades 
containing the largest proportions of nine-year- 
old and 13-year-old students. Student assessments 
were conducted during the spring of the 2002/2003 
school year. A matrix sampling procedure was used to 
compile test items into booklets because of the large 
number of science and mathematics test items on 
the assessment (Martin & Mullis, 2004). Eight test 
booklets were developed and six blocks of items were 
included in each booklet (Smith Neidorf & Garden, 
2004). Representative samples of students took each 
part of the assessment. The intention of the TIMSS 
2003 assessment was to measure student performance 
on both mathematics and science at the Grade 4 and 
Grade 8 levels. 

Several procedures were used to select the schools 
within the Japanese and the United States samples. For 
the sample of students from Japan, initial stratifications 
were made in order to exclude schools for educable 
mentally disabled students and functionally disabled 
students. Further stratification was made by level of 
urbanization (big city area, city area, and non-city 
area). This procedure resulted in a total sample of 150 
schools, all of which participated in the TIMSS 2003 
assessment. For the United States sample, stratifications 
were made by school type (public/private) and region. 
Schools at the Grade 8 level were also stratified by 
minority status (more than 15% minority students/less 
than 15% minority students). This procedure resulted 
in 301 schools in the sample, of which 232 schools 
participated in the TIMSS 2003 assessment. 

Students 

The students included in these analyses were from the 
TIMSS 2003 Population 2 samples (13-year-olds) 
from the United States and Japan. Of these students, 
4,244 from Japan and 7,862 students from the 
United States completed all of the measures regarding 
classroom instructional strategies and mathematics 
beliefs examined in this study. 

Measures 

As part of the TIMSS 2003 assessment, students were 
given a questionnaire that collected various data, 
including information regarding student beliefs about 
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science and mathematics, classroom instructional 
activities, family characteristics, learning resources, 
out-of-school activities, and science and mathematics 
achievement. 

This present study examined the influence of several 
mathematics beliefs on mathematics achievement. The 
items included in these analyses were: 

1 . I usually do well in mathematics 

2. I would like to take more mathematics in school 

3. Mathematics is more difficult for me than for 
many of my classmates 

4. I enjoy learning mathematics 

5. Sometimes when I do not initially understand 
a new topic in mathematics, I know that I will 
never really understand it 

6. Mathematics is not one of my strengths 

7. I learn things quickly in mathematics 

8. I think learning mathematics will help me in my 
daily life 

9. I need mathematics to learn other school 
subjects 

10. I need to do well in mathematics to get into the 
university of my choice 

11. I would like a job that involved using 
mathematics 

12. I need to do well in mathematics to get the job 
I want. 

For these items, the original codings were 
transformed so that the following levels of student 
agreement were indicated: (1) disagree a lot, (2) 
disagree a little, (3) agree a little, or (4) agree a lot. 

With respect to classroom instructional activities, 
students indicated how frequently the following 
strategies were used in their mathematics lessons: 

1. We practice adding, subtracting, multiplying, 
and dividing without using a calculator 

2. We work on fractions and decimals 

3. We interpret data in tables, charts, or graphs 

4. We write equations and functions to represent 
relationships 

5. We work together in small groups 

6. We relate what we are learning in mathematics to 
our daily lives 

7. We explain our answers 

8. We decide on our own procedures for solving 
complex problems 

9. We review our homework 

10. We listen to the teacher give a lecture-style 
presentation 



1 1 . We work problems on our own 

12. We begin our homework in class 

13. We have a quiz or test 

14. We use calculators. 

For each of these items, the original codings were 
transformed so that the following values were used to 
indicate the frequency of each activity: (1) never, (2) 
some lessons, (3) about half the lessons, (4) every or 
almost every lesson. 

The dependent measure examined in this study 
was each student’s algebra score on the TIMSS 
2003 assessment. Because students in the TIMSS 
2003 assessment were given relatively few test items 
in each specific content area, statistical procedures 
were developed to estimate student proficiency by 
generating plausible values for each student based on 
responses given (Gonzalez, Galia, & Li, 2004). Each 
plausible value provides an estimate of the performance 
of each student had they actually taken all possible 
items on the assessment. Five plausible score values 
were computed for each student because of error in 
the generation of these imputed proficiency values 
(Gonzalez et al., 2004). To provide consistency with 
the statistical procedures used for computing each 
national average score for mathematics achievement, 
the dependent measure used in this study was the 
average of the five plausible values generated for each 
student on the TIMSS 2003 algebra assessment. 

Procedure 

Statistical procedures applied to data collected using 
simple random sampling are inappropriate for data 
collected from assessments using complex sampling 
designs (Foy&Joncas,2004). One potential problem of 
using statistical procedures for simple random sampling 
on data collected from complex sampling designs is 
the possibility of underestimation of the error (Ross, 
1979). Underestimation of error can produce spurious 
findings of statistical significance in hypothesis testing 
(Wang & Fan, 1997). Consequently, it is critical when 
conducting appropriate statistical tests of significance 
that the design effect is considered and procedures are 
used that produce unbiased variance estimates. 

Because the TIMSS 2003 assessment employed a 
two-stage stratified cluster sample design, jackknife 
variance estimation procedures using replicate weights 
were used to compute appropriate standard errors for 
each variable included in this study. Brick, Morganstein, 



14 



J. D. HOUSE & J. A. TELESE: RELATIONSHIPS BETWEEN STUDENT AND INSTRUCTIONAL FACTORS AND ALGEBRA ACHIEVEMENT 



and Valliant (2000) found that jackknife variance 
procedures are an effective method for providing 
full-sample estimates for data collected from cluster 
sample designs. This technique simulates repeated 
sampling of students from the initial sample according 
to the specific sample design (Johnson & Rust, 1992). 
Sometimes referred to as a re-sampling plan, this 
technique produces estimates of the population means 
and the standard errors of those estimates (Welch, 
Huffman, & Lawrenz, 1998). An advantage of using 
the jackknife replication statistic is that this method 
increases the generalizability of research findings 
because it provides population estimates rather than 
findings from a single sample (Ang, 1998). 

For this study, multiple regression procedures were 
used to simultaneously assess the relative contribution 
of each self-belief variable and classroom instructional 
strategy in explaining the algebra test scores. In each 
instance, analyses were conducted separately for the 
entire sample of students from each country. 

Results 

Table 1 presents a summary of the results from the 
multiple regression analysis of relationships between 
mathematics beliefs, classroom instructional strategies, 
and algebra test scores for students in Japan. 

Seven mathematics belief variables significantly 
entered the multiple regression equation. Students who 
earned the higher algebra test scores were those most 
likely to indicate they usually did well in mathematics 
and enjoyed learning mathematics. Similarly, 
students who indicated they learned things quickly in 
mathematics also tended to earn higher algebra test 
scores. Students who earned higher test scores were also 
the students most likely to indicate they needed to do 
well in mathematics to gain entry to the university of 
their choice. Conversely, students who earned the lower 
algebra test scores tended to report that mathematics 
was not one of their strengths. In addition, students 
who expressed negative comparisons of themselves in 
terms of their mathematics ability relative to the ability 
of other students (“Mathematics is more difficult for 
me than for many of my classmates”) tended to earn 
the lower test scores. Students who reported that they 
would like to take more mathematics in school actually 
earned lower algebra test scores. 

Seven instructional strategies also significantly 
entered the multiple regression equation. Students 



who earned higher test scores were those who 
reported that they spent time practicing mathematical 
operations (adding, subtracting, multiplying, and 
dividing) without using a calculator. Students who 
showed higher algebra test scores also reported that 
they frequently decided on their own procedures for 
solving complex problems and explained their answers 
during mathematics lessons. Similarly, students who 
said they frequently worked problems on their own 
tended to earn the higher algebra test scores. 

Three instructional strategies showed significant 
negative relationships with algebra test scores. Frequent 
use of cooperative learning activities (students working 
together in small groups) was negatively related to 
algebra achievement. Students who showed lower 
algebra test scores also reported frequently relating 
what they were learning in mathematics to their daily 
lives. In addition, students who reported that they 
frequently used calculators during mathematics lessons 
tended to earn lower test scores. 

The overall multiple regression equation that 
assessed the joint significance of the complete set of 
mathematics beliefs and instructional strategies was 
significant (A(26,49) = 64.84, p < .001) and explained 
33.7% of the variance in algebra test scores for 
adolescent students in Japan. 

Findings from the multiple regression analysis of 
relationships between mathematics beliefs, classroom 
instructional strategies, and algebra test scores for 
students in the United States are summarized in Table 2. 

Ten mathematics belief variables significantly 
entered the multiple regression equation. Students 
who showed the higher algebra test scores were those 
students most likely to indicate that they usually did 
well in mathematics and learned things quickly in 
mathematics. Students who reported that they would 
like a job that involved using mathematics and that 
they needed to do well in mathematics to get into the 
university of their choice also earned higher test scores. 
Conversely, students who earned the lower algebra 
test scores were the students most likely to report 
that mathematics was not one of their strengths, and 
students who expressed negative comparisons of 
themselves relative to other students (“Mathematics is 
more difficult for me than for many of my classmates”) 
tended to obtain lower algebra test scores. Similarly, 
students who expressed negative self- appraisals of their 
ability to learn new material (“Sometimes when I do 
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Table 1: Relationships between Mathematics Beliefs, Classroom Instructional Strategies, and Algebra Test Scores 


(Japan) 








Self-belief/lnstructional activity 


Parameter estimate 


Standard errors of estimate 


Z-score 


Mathematics beliefs 








1 usually do well in mathematics 


25.359 


2.300 


1 1 .03** 


i would like to take more mathematics in school 


-5.275 


2.123 


-2.48* 


Mathematics is more difficult for me than for many 


-2.977 


1.482 


-2.01* 


of my classmates 








1 enjoy learning mathematics 


7.377 


1.842 


4.01** 


Sometimes when 1 do not initially understand a 


0.404 


1.169 


0.34 


new topic in mathematics, 1 know that 1 will never 








understand it 








Mathematics is not one of my strengths 


-5.600 


1.685 


-3.32** 


1 learn things quickly in mathematics 


9.987 


2.535 


3.94** 


1 think mathematics will help in my daily life 


-3.700 


1.952 


-1.89 


1 need mathematics to learn other school subjects 


1.123 


1.959 


0.57 


1 need to do well in mathematics to get into the 


5.126 


1.423 


3.60** 


university of my choice 








i would like a job that involved using mathematics 


1.008 


2.134 


0.47 


1 need to do well in mathematics to get the job 1 want 


1.971 


1.526 


1.29 


Instructional strategies 








We practice adding, subtracting, multiplying, and 


1 1 .323 


1.413 


8.01** 


dividing without using a calculator 








We work on fractions and decimals 


1.199 


1.672 


0.72 


We interpret data in tables, charts, or graphs 


-1.612 


2.460 


-0.65 


We write equations and functions to represent 


-0.260 


1.743 


-0.15 


relationships 








We work together in small groups 


-6.500 


2.191 


-2.97** 


We relate what we are learning in mathematics to our 


-8.923 


2.453 


-3.64** 


daily lives 








We explain our answers 


5.166 


1.648 


3.13** 


We decide on our own procedures for solving 


4.038 


1.656 


2.44* 


complex problems 








We review our homework 


-0.388 


1.830 


-0.21 


We listen to the teacher give a lecture-style presentation 


3.420 


2.928 


1.17 


We work problems on our own 


18.063 


1.772 


10.19** 


We begin our homework in class 


3.469 


2.052 


1.69 


We have a quiz or test 


-3.078 


1.833 


-1.68 


We use calculators 


-15.904 


2.394 


-6.64** 



Note: **p < .01 ; * p < .05. 
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Table 2: Relationships between Mathematics Beliefs, Classroom Instructional Strategies, and Algebra Test Scores 


(United States) 








Self-belief/lnstructional activity 


Parameter estimate 


Standard errors of estimate 


Z-score 


Mathematics beliefs 








1 usually do well in mathematics 


14.840 


1.450 


10.23" 


1 would like to take more mathematics in school 


1.544 


1.237 


1.26 


Mathematics is more difficult for me than for many 


-5.329 


1.136 


-4.69" 


of my classmates 








1 enjoy learning mathematics 


-6.093 


1.325 


-4.60" 


Sometimes when 1 do not initially understand a 


-9.369 


0.980 


-9.56" 


new topic in mathematics, i know that 1 will never 








understand it 








Mathematics is not one of my strengths 


-9.384 


1.329 


-7.40" 


1 learn things quickly in mathematics 


5.036 


1.290 


3.90" 


1 think mathematics will help in my daily life 


-1 1 .844 


1.348 


-8.79" 


1 need mathematics to learn other school subjects 


-0.394 


1.384 


-0.28 


1 need to do well in mathematics to get into the 


12.115 


1.718 


7.05" 


university of my choice 








i would like a job that involved using mathematics 


6.092 


1.409 


4.32" 


1 need to do well in mathematics to get the job 1 want 


-6.423 


1.192 


-5.39" 


Instructional strategies 








We practice adding, subtracting, multiplying, and 


-0.263 


1.067 


-0.25 


dividing without using a calculator 








We work on fractions and decimals 


-3.305 


1.652 


-2.00* 


We interpret data in tables, charts, or graphs 


-6.820 


1.567 


-4.35" 


We write equations and functions to represent 


18.309 


1.443 


12.69" 


relationships 








We work together in small groups 


-3.411 


1.638 


-2.08* 


We relate what we are learning In mathematics to 


-10.582 


1.154 


-9.17" 


our daily lives 








We explain our answers 


-1.635 


1.553 


-1.05 


We decide on our own procedures for solving 


-1.241 


1.251 


-0.99 


complex problems 








We review our homework 


12.708 


1.530 


8.31" 


We listen to the teacher give a lecture-style 


-2.243 


0.912 


-2.46* 


presentation 








We work problems on our own 


6.693 


1.543 


4.34" 


We begin our homework in class 


1.597 


1.641 


0.97 


We have a quiz or test 


-8.253 


1.402 


-5.89" 


We use calculators 


7.845 


1.648 


4.76" 



Note: **p < .01 ; * p < .05. 
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not initially understand a new topic in mathematics, 
I know that I will never really understand it”) also 
obtained lower test scores. Interestingly, students who 
earned lower algebra test scores also reported that they 
enjoyed learning mathematics, needed to do well in 
mathematics to get the job they wanted, and thought 
learning mathematics would help in their daily lives. 

Ten instructional strategies also significantly 
entered the multiple regression equation. Students 
who earned the higher test scores reported that they 
frequently wrote equations and functions to represent 
the relationships during their mathematics lessons. 
Students who showed higher algebra achievement 
also indicated that they frequently reviewed their 
homework and used calculators during mathematics 
lessons. Students who said they often worked problems 
on their own during mathematics class also tended to 
earn higher algebra test scores. 

Six instructional strategies showed significant 
negative relationships with algebra achievement. For 
instance, students who earned the lower test scores 
reported that they frequently worked on fractions and 
decimals and had a quiz or test during class. Similarly, 
students who indicated that they frequently engaged 
in cooperative learning activities (worked together in 
small groups) and interpreted data in tables, charts, 
and/or graphs also tended to earn lower algebra 
test scores. In addition, students who showed lower 
algebra achievement indicated they frequently listened 
to the teacher give a lecture-style presentation. These 
students also said they related what they were learning 
in mathematics to their daily lives. 

The overall multiple regression equation that 
assessed the joint significance of the complete set of 
mathematics beliefs and instructional strategies was 
significant (T(26,50) = 63.15 , p < .001) and explained 
35.3% of the variance in algebra test scores for 
adolescent students in the United States. 

Discussion 

Several significant findings emerged from this study. 
For instance, a number of mathematics beliefs were 
significantly associated with algebra achievement for 
students in Japan and the United States. Students 
from both countries who indicated that they learned 
things quickly in mathematics and usually did well in 
mathematics tended to be those students who earned 
the higher algebra test scores. Students from both 



countries who earned higher test scores also indicated 
that they needed to do well in mathematics to get into 
the university of their choice. Conversely, students 
from both countries who earned the lower algebra 
test scores were more likely than students who earned 
higher scores to report that mathematics was not one of 
their strengths. Further, students from both countries 
who compared their ability in mathematics negatively 
with the ability of other students (“Mathematics is 
more difficult for me than for many of my classmates”) 
showed lower test scores. 

Differences were also noted for students from 
the United States and Japan. Students in Japan who 
reported they enjoyed learning mathematics tended 
to earn higher algebra test scores, while a negative 
relationship was found for students in the United 
States. In addition, students from the United States 
who expressed negative self-appraisals of their ability 
to learn new mathematics information (“Sometimes 
when I do not initially understand a new topic in 
mathematics, I know that I will never really understand 
it”) tended to earn lower algebra test scores; the same 
relationship was not significant for students in Japan. 

Several classroom instructional strategies were 
significantly associated with algebra achievement for 
students in both countries. Similarities and differences 
between students in the United States and in Japan 
were also found for these relationships. In regard 
to similarities, students from both countries who 
reported frequently working problems on their own 
during mathematics lessons also earned higher algebra 
test scores. Conversely, students from both countries 
who indicated they frequently engaged in cooperative 
learning activities (working together in small groups) 
also tended to earn lower test scores. Similarly, students 
from both countries who frequently related what they 
were learning in mathematics to their daily lives earned 
lower algebra test scores. 

Several differences in the relationship between 
instructional practices and algebra test scores also were 
noted for students in the United States and Japan. For 
instance, the more often students in Japan practiced 
basic mathematical operations (adding, subtracting, 
multiplying, and dividing) without using a calculator, 
they more likely they were to earn higher algebra test 
scores; the same relationship was not significant for 
students in the United States. In addition, students in 
Japan who frequently explained their answers during 
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mathematics lessons earned higher test scores. This 
relationship was not significant for students in the 
United States. However, students in the United States 
who earned higher algebra test scores reported that they 
frequently wrote equations and functions to represent 
relationships. This association was not significant for 
students in Japan. Also, those students in the United 
States who frequently reviewed their homework 
during mathematics lessons were those who earned 
higher algebra test scores; the same relationship was 
not significant for students in Japan. 

Finally, students from the United States and 
Japan showed opposite trends for the relationship 
between calculator use during mathematics lessons 
and algebra test scores. Those students in the United 
States who reported frequently using calculators during 
mathematics lessons were also those more likely to earn 
higher test scores. Conversely, students in Japan who 
more frequently used calculators tended to earn lower 
test scores. 

Several of the relationships between self-beliefs 
and mathematics achievement found in this study 
are consistent with results from previous research. 
For example, House (1993) and Wheat, Tunnell, 
and Munday (1991) found significant relationships 
between academic self-concept and algebra course 
grades. Ma (2001) reported that students’ future 
expectations of success exerted a significant influence 
on enrollment in advanced mathematics. Similarly, 
House (2000) found achievement expectancies and 
academic self-concept to be significant predictors 
of student achievement in science, engineering, and 
mathematics. Also of interest is that the relationship 
between self-concept and mathematics achievement 
appears to become stronger as students reach higher 
class levels in school (Ma & Kishor, 1 997). With respect 
to findings from the TIMSS assessments, Hammouri 
(2004) reported that student beliefs exerted significant 
direct effects on the mathematics test scores of students 
in Jordan. These findings emphasize the importance 
of considering student beliefs when assessing factors 
related to mathematics achievement. 

The results of the present study, in concert with 
the findings of other research, also have implications 
for mathematics teaching practices and the learning 
of mathematics. For instance, in the present study, 
students from both countries who reported frequent 
use of active learning (“We work problems on our 



own”) also tended to earn higher algebra test scores. 
Various classroom strategies exist that have the aim 
of developing mathematical connections, particularly 
through use of concrete models and practical examples. 
Work by Crocker and Long (2002) and Hines (2002) 
focuses on the use of practical exercises and physical 
models to teach students elementary functions and 
exponents. Smith (1999) notes that giving students the 
opportunity to reflect after engaging in active learning 
experiences helps them integrate the information with 
their existing knowledge. 

A second application of the results of the present 
study is selection of learning examples to foster 
positive attitudes toward mathematics. The results 
of this study show significant relationships between 
self-beliefs and algebra test scores for students in both 
countries. Boyer (2002) proposes several classroom 
activities that develop students’ self-management 
of their learning and improve their motivation for 
learning mathematics. Reimer and Moyer’s (2005) 
assessment of the effectiveness of using computer-based 
manipulatives for Grade 3 mathematics indicated 
improvement in students’ enjoyment for learning 
mathematics. Mathematics teachers have also reported 
that the use of hands-on projects during class results in 
increased motivation and participation (DeGeorge & 
Santoro, 2004). Similarly, the use of learner-centered 
classrooms appears to improve intrinsic motivation for 
mathematics learning (Heuser, 2000). 

The significant associations found in this present 
study between the algebra achievement of students 
in the United States and Japan, the beliefs these 
students held about mathematics, and the classroom 
instructional strategies they experienced provide 
several directions for further research. For instance, 
longitudinal research is needed to assess the effects 
of mathematics beliefs and instructional practices 
on other measures of student achievement, such as 
enrollment in advanced mathematics courses in high 
school. Findings from a recent study conducted by 
House (2005) show mathematics beliefs (perceptions 
that mathematics is an easy subject and students 
reporting that they enjoy learning mathematics) 
significantly related to “fractions” and “number sense” 
scores for adolescent students in Japan. Research is 
also needed to determine whether the effects of the 
self-beliefs and instructional activities noted in the 
present study hold for other measures of student 



19 



IEA CONFERENCE PROCEEDINGS 2006 



achievement. For example, the United States and 
Japanese students in the present study who frequently 
used cooperative learning strategies were significantly 
more likely than were the other students in the two 
samples to have a low level of achievement in algebra. 
This finding differs from more recent results that show 
a positive significant relationship between frequent 
use of cooperative learning activities in science lessons 
and science tests scores for elementary-school students 
in Japan (House, 2006b). Further study therefore is 
needed to clarify the relationship between cooperative 
learning and algebra achievement for students in the 
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Abstract 

Closing achievement gaps between sub-populations 
in Israel, and amongst them between students in the 
Hebrew-speaking schools and the Arabic-speaking 
schools, continues to be one of the priorities of the 
country’s education system. TIMSS 2003 findings 
provide first evidence that efforts made during the 
1990s to close these gaps were in the right direction 
and that, although inequality in input between the two 



sectors still remains, gaps in learning outcomes have 
narrowed. This paper highlights the dynamics that led to 
the narrowing achievement gap. Instead of focusing on 
school conditions at one point in time (Zuzovsky, 2005), 
this paper looks for changes that occurred from 1999 
to 2003 in certain school/class-level variables in the two 
ethnic sectors and relates them to the uneven increase in 
achievement of these sectors. 



Background 

Closing achievement gaps between sub-populations 
in Israel, especially those related to socioeconomic, 
ethnic, or gender factors, continues to be one of 
the most prioritized goals of the national education 
system. The first two acts of education were enacted 
primarily to ensure that all students in Israel, without 
any discrimination on the basis of race, religion, 
or gender, would have the right to education 
(Compulsory Education Act 1949) and that the 
state would provide equal educational opportunities 
regardless of political or any other organizational 
affiliation (State Education Act 1953). These legislative 
acts resulted in a centraiistic educational policy that 
issued in a strategy of equity among schools regarding 
both school inputs (e.g., a unified curriculum, equal 
numbers of students per class, equal weekly learning 
hours per grade level, equal number of learning days 
per year, similar teaching methods, etc.) and school 
outputs (e.g., equal percentage of students entitled to 
a matriculation certificate, equal levels of achievement, 
etc.). 

Toward the end of the 1960s, it became clear that 
this policy had not had the expected results, as large 
gaps in educational outcomes between students from 
different ethnic origins, usually also associated with 



socioeconomic status (SES), were persistently evident. 
The discourse on inequality in Israel’s education system 
in the 1960s, and even in the 1970s, focused mainly on 
inequality within the Jewish sector, between students 
of Sephardi and Ashkenazi 1 origin or between high- 
and low-SES students, and ignored the inequality 
between the Eiebrew- and Arabic-speaking student 
population groups. The Arab population of Israel has 
what amounts to a separate education system within 
the larger Israeli one, in which students, teachers, 
and principals are all Arab citizens of Israel and the 
language of instruction is Arabic, not Eiebrew. Only 
rarely during the 1960s and 1970s did policymakers 
and researchers deal with inequality between the Arab 
and Jewish populations in Israel (Mari, 1978; Mari & 
Dahir, 1978; Peled, 1976). 

The failure of the equity policy, as it was practiced 
within the Jewish sector, gave way to a policy of 
differential treatment and affirmative action to schools 
hosting “disadvantaged” students, which included 
financial benefits, differential curricula, special entrance 
thresholds on to higher education institutions, special 
placement tests, and graded tuition fees for students 
from low-income families (Gaziel, Elazar, & Marom, 
1993). This affirmative policy, introduced in the late 



1 Sephardi: A Jewish person of Spanish or Portuguese origin, now used loosely to refer to any Jewish person who is not of Northern and Eastern European 
(. Ashkenazi ) descent (also known as “Mizrachi”). 
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1960s, was implemented at a much later stage in 
Arabic-speaking than in Hebrew-speaking schools. 

During the 1980s, the general director of the 
Ministry of Education appointed several committees 
that generated plans to improve the education system 
in the Arab sectors (1981, 1985, mentioned in Gaziel, 
Elazar, & Marom, 1993). Their critique concerning 
discrimination (A1 Haj, 1995; Bashi, Kahan, & Davis, 
1981; Kahan & Yelnik, 2000; Mazawi, 1996; Shavit, 
1990; Zarzure, 1995) led the Ministry of Education 
in the 1990s to announce two five-year plans for the 
Arabic-speaking sector that were mainly affirmative in 
nature. The first one started in the early 1 990s and the 
second was launched in 1999. These projects aimed to 
improve all aspects of educational activity by increasing 
the number of students entitled to matriculation 
certificates, reducing the percentage of drop-outs, 
adding study hours, increasing auxiliary staff in 
schools, enhancing science and technology education, 
promoting special education services, and providing 
professional support for teachers and principals. The 
plans also included construction and development of 
school buildings. 

The affirmative five-year plans in the Arab sector 
resulted in several improvements. For instance, from 
1990 to 2001, enrolment rates of 14- to 17-year-olds 
in the upper elementary schools increased in the Arab 
sector by 26% compared to only 6% in the Jewish 
sector. More study time was allocated to Arabic- 
speaking schools, with an increase in the average hours 
per class and the average hours per student at all school 
levels, but especially at the upper secondary level, and 
more so in the Arab sector than in the Jewish sector 
(Sprinzak, Bar, Levi-Mazloum, & Piterman, 2003). 

Despite these improvements, inequalities between 



the two education systems in both inputs and 
outputs continued to appear (Lavi, 1997). An official 
publication (Sprinzak et al., 2003) on allocation of 
inputs and on outputs in the Jewish and Arab sectors in 
the wake of the Third International Mathematics and 
Science Study (TIMSS) 2003 revealed gaps between 
the two systems in favor of the former (see Table 1). 

Inequalities between thejewish and Arab population 
groups go beyond the education system, encompassing 
other social aspects. According to official sources 
(Knesset Research and Information Center, 2004), the 
Arab sector is characterized by larger families, lower 
levels of parental education, lower income levels, 
higher ratio of families living below the poverty line, 
and lower percentage of employment (see Table 2). 

The data presented demonstrate the ongoing 
inequality between the two population groups, which 
is in line with the persisting achievement gaps between 
the two populations as intensively reported in recent 
national and international studies (Aviram, Cafir, & 
Ben Simon, 1996; Cafir, Aviram, & Ben Simon, 1999; 
Karmarski & Mevarech, 2004; Sprinzak et al., 2003; 
Zuzovsky, 2001, 2005). 

More than ever, the public education system is called 
upon to eliminate this inequality. The recent National 
Task Force for the Advancement of Education (Dovrat 
Committee, 2005) specifically refers to this inequality 
in some of its recommendations: 

Arab education shall have full budgetary equality based 
on uniform, differential per-student funding, like the rest 
of the educational system. Disparities between Jewish 
and Arabic schools with regard to buildings and physical 
infrastructure shall be eliminated in an effort to reduce, 
as quickly as possible, the disparities in educational 
achievement, including the percentage of students 
who graduate from high school and the percentage of 



Table 1: Input and Output Indicators in Hebrew and Arabic Education 





Hebrew sector 


Arab sector 


Average no. of hours per student* 


1.97 


1.64 


No. of students per FTP** 


8.6 


11.6 


Average no. of students per class 


26 


29 


Enrolment rates — ages 1 4-1 7 (%) 


96 


80.5 


Percentage of Grade 1 2 students entitled to matriculation certificates 


52.3 


45.6 


Annual student dropout rate between Grades 9 and 1 2 (%) 


4.9 


9.8 



Note: * Total school hours/number of students; ** FTP = Full-time teacher's position. 

Source: Sprinzak et al. (2003): Tables C9, Cl 1 , C23, FI . 
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Table 2: Demographic Characteristics of Jewish and Arab Population Groups in Israel (2003) 





Hebrew population 


Arab population 


Percentage of women with upper secondary education 


42.0 


22.8 


Percentage of men with upper secondary education 


37.9 


23.4 


Average no. of children per household 


3.13 


5.06 


Percentage of families below poverty line 


17.7 


44.7 


Percentage of employed persons 


57.0 


39.0 



Source: Knesset Research and Information Center (2004). Background report on school outcomes in the Arab sector presented to the Child 
Rights Committee in the Knesset. 



students eligible for matriculation certificates. ( Dovrat 
Summary Report , Dovrat Committee, 2005, p. 29) 
Surprisingly, despite the inequalities reported that 
could justify the continuing achievement gap, findings 
from the last TIMSS study (2003) indicated not only 
that there was a significant increase in Israel’s mean 
scores in mathematics and science since 1 999 (30 points 
score gain in mathematics and 20 points in science), 
but that this increase was much more profound in the 
Arabic-speaking schools than in the Hebrew-speaking 
ones (three times more in mathematics (68 score 
points vs. 23) and five times more in science (64 score 
points vs. 12)). As a result of this uneven increase, the 
achievement gap between the two sectors in favor of 
the Jewish sector narrowed from about one standard 
deviation in 1999 to 0.5 SD in mathematics and 0.45 
SD in science in 2003 (Zuzovsky, 2005). 

This uneven increase was the trigger for a previous 
study (Zuzovsky, 2005) that aimed to identify the web 
of effects that narrowed the achievement gap between 
the two sectors in 2003. Higher frequencies of certain 
instructional variables also found to be more positively 
associated with achievement in the Arab sector, and 
relatively higher schooling effects in less economically 
developed sectors such as the Arab sector in Israel, 
explained why students from this sector performed 
better in 2003. 

The present study is a further step in studying the 
narrowing achievement gap between the two sectors. 
Rather than looking for school-level variables that 
have a differential effect on achievement in the two 
sectors at one point in time, the present work aims 
to look for changes that occurred in these variables in 
the two sectors from 1999 to 2003, and to relate them 
to changes in achievement that occurred in the two 
sectors during this period. 

A possible interaction effect that the two major 
independent variables in this study — the ethnic 



affiliation of the school (“sector”) and the year of 
testing (“year”) — might have on the variability of 
the achievement scores was explored and confirmed 
at an initial phase of the study. Findings from a 
two-way, between-group analysis of variance, and 
later findings from a multi-level regressions analysis, 
revealed a significant interaction effect of sector*year 
on achievement. It remained for this study to look for 
the variables that affect this interaction effect in a way 
that favors the achievement gains of students in the 
Arabic-speaking schools. 

In line with this aim, the following question was 
raised: Which are the relevant school/class contextual 
variables whose frequency and/or association with 
achievement changed differently in the two sectors from 
1999 to 2003 in a way that results in higher achievement 
gains of students in Arabic-speaking schools as compared 
to the gains of students in Hebrew-speaking schools ? 

Method 

Answering this question required several steps. The 
first of these delineated those contextual school-level 
variables whose frequency changed differently in 
the two sectors over the years. A series of two-way, 
between-groups analyses of variance were employed to 
identify significant joint effects (two-way interaction 
effects) of sector and year on the frequency or means 
of selected school/class contextual variables. The 
contextual school-level variables that were delineated 
in this first step served the second step of the analysis, 
which aimed to look at whether these variables affect 
the interaction effect sector and year have on the 
variability of students’ outcomes (looking for a three- 
way interaction effect). 

Acknowledging the hierarchical nature of the 
education system and the simultaneous, non-additive 
dependence of achievement on variables acting at each 
level of this hierarchy, I have adopted a multilevel and 
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interactive approach to study these effects (Aitkin 
& Zuzovsky, 1994), applying HLM5 (hierarchical 
linear and non-linear modeling) software (Bryk & 
Raudenbush, 1992; Raudenbush, Bryk, Cheong, & 
Congdon, 2000). 

As the sampling procedure of TIMSS allowed 
the sampling of at least one class from each sampled 
school (as was done in Israel), the school and class- 
level effects are confounded. Thus, the models that 
were specified for the analyses were two-level models 
of students nested in schools/classes. The possibility of 
considering year of testing as an additional hierarchical 
level, as was done in the case of cross-sectional studies 
with repeated observation for schools or students 
(Aitkin, 1988; Raudenbush, 1989), was rejected, as 
the sampling design in IEA studies provides neither 
repeated measures of schools nested in time, nor 
of students nested within schools over time. The 
alternative models specified contained, at their first 
level, five variables describing student characteristics 
(determined as having fixed effects on the variability 
of achievement). 2 

The second-level variables included (in all 
alternative models specified) both sector and year as 
well as their interaction terms. These two variables were 
treated as school-level dummy variables. Year describes 
the global functioning and conditions at two points 
in time: 1999 (0) and 2003 (1), and sector indicates 
whether the school is Arabic-speaking (0) or Hebrew- 
speaking (l). 3 In addition to these two “constantly 
appearing” school-level variables, only one of a series 
of selected school/class-level contextual variables and 
all its possible two- and three-way interaction terms 
with the former two predictors were introduced 
into the model. The reason for employing only one 
additional school/class-level variable in each model 
stemmed from computational constraints on the 
number of model parameters allowed to be included 
in the regression equation. In interactive models that 
include several school-level variables and all possible 
relevant interaction terms between them, there is 
a limit to the number of the school-level variables 
allowed. 4 The criteria used to select the school/class 



contextual variables and the process of selecting them 
are described further on in this paper. 

To avoid problems of multi-colinearity and to 
maximize interpretability, the school-level contextual 
predictors were standardized around their grand mean 
(see Aiken & West, 1 99 1 , p. 43) . The regression models 
specified, for each outcome score in an equation format 
for all HLM analyses, were the following: 

Level 1 Model 

Y = |3o + (3 1 (Number of books in student’s home - 
BSBGBOOK) + |32(Level of education student aspires 
to finish - BSBGHFSG) + (^(Academic education 

- “fathers” - ACADEM_F) + PffAcademic education 

— “mothers” — ACADEM_M) + (^(Number of people 
in student’s home — BGPLHOl) + R. 

Level 2 Model 

|3o = Yoo + Yoi (Sector) + yo 2 (Year) + yo 3 (Sector*Year) 
+ yoffstandardized relevant school variable) + 
yo 5 (year*standardized relevant school variable) 
+ yo6 (sector*standardized relevant school variable) + 
Y 07 (year*sector*standardizedrelevant school variable) 
+ U. 

Models that exhibited a significant three-way 
interaction were then used to predict mathematics and 
science achievements in the two sectors at two points 
in time, conditioned on different values of the relevant 
school variable involved. Three possible values of these 
variables on their standardized scale were used for these 
predictions: 1 — minimal value , 2 — actual mean value 
set to 0, and 3 — maximal value. Plotting the predicted 
outcomes highlighted the role of these contextual 
variables in the narrowing of the achievement gap 
that occurred from 1999 to 2003 between students 
in Hebrew-speaking schools and students in Arabic- 
speaking schools. 

Data source 

The data source that served the analyses was obtained 
from the TIMSS 1999 and TIMSS 2003 studies in 
Israel. A total of 4,195 Grade 8 students from 139 
schools participated in the study in 1999. Of these 
students, 3,383 studied in 112 Hebrew-speaking 



2 The regression coefficients of all first-level student variables employed in the analyses were found to have a non-significant random effect over schools; thus, they 
were specified as having fixed effects. 

3 This distinction has several implications in terms of socioeconomic and cultural differences between the two populations. 

4 As a general principle, models should contain a reduced set of variables by at least an order of magnitude from the number of cases to the number of model 
parameters. With about 245 schools participating in the 1999 and 2003 studies, the second-level model parameter should not exceed more than 24 regression 
terms. 
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schools and 812 in 27 Arabic-speaking schools. In 
2003, these numbers were 4,318 Grade 8 students, of 
which 3,163 studied in 108 Hebrew-speaking schools 
and 1,098 in 38 Arabic-speaking schools. 

Most of the school/class contextual variables were 
derived from student questionnaires. I chose this 
data source in order to limit missing data problems. 
These variables described students’ background 
characteristics and students’ perceptions regarding 
the instructions they received. When these variables 
are aggregated or averaged at the class level, they 
provide us with valid class-level measures of student 
body composition and prevalent modes of instruction. 
Some data were obtained from principal and teacher 
questionnaires. As more than one teacher taught the 
sampled classes, responses of teachers teaching the 
same class were also averaged. 

As the HLM5 software (Raudenbush et al., 2000) 
assumes a complete set of data, it provides two options 
for handling missing data problems for Level 1 : pairwise 
and listwise deletion of cases. In this study, pairwise 
deletion was applied to Level 1 (student) variables. 
However, at the second level (school/class), in order 
to use all available data, listwise deletion of cases was 
needed. At the end of this process, full sets of school- 
level data relevant to the study were obtained describing 
about 246 schools when regressing mathematics scores 
and about 243 schools when regressing science scores. 

The regressed outcome scores were estimates of the 
first plausible score 5 on the mathematics and on the 
science scales. 

Selection of potential explanatory contextual 
school/class variables 

Since this study deals with changes that occurred 
from 1 999 to 2003 in the frequency and effectiveness 
of certain school/class-level contextual variables, the 
analyses conducted were limited only to those variables 
that appeared in both studies and were identical or 
only slightly adjusted. 

This set of variables (about 60 in total), which 
described students and their home characteristics, 
student class composition, and instruction and school 
characteristics, already reflects theoretical as well as 
empirical input generated in earlier cycles of TIMSS 
studies. Out of this set of variables, a restricted set — 33 



variables of theoretical importance and with evidence 
of being related to achievement in Hebrew-speaking 
schools or in Arabic-speaking schools — was selected 
to go through the next screening process, which 
aimed to look for those variables whose frequency 
changed differently over time in the two sectors. 
This process was based mainly on the results of the 
two-way, between-group analyses of variance where 
significant joint effects of sector and year on measures 
of frequency of several school/ class-level variables were 
detected. Additional data on changes that occurred 
in the association of these variables with achievement 
supported the choice of a final set of 17 school/class- 
level variables that fulfilled both or at least one of the 
two requirements: showing a significant differential 
change in their association with science or mathematics 
achievement over the years in the two sectors and/or 
showing a significant differential change in their 
frequency over time in the two sectors. These variables 
were then fitted, along with sector and year, into two- 
level interactive regression models, thus serving the 
last step in the analyses, aimed at revealing significant 
three-parameter interaction effects on achievement. 
The appendix to this paper presents a description of all 
variables used in the advanced stages of the analyses. 

Results 

The results are presented below with reference to the 
different steps of the analysis. 

Delineating school-level variables whose frequency 
changed differently over time in the two sectors 

Tables 3 and 4 display combined findings from a two- 
way analysis of variance of the frequency of school 
contextual variables (dependent variables) by sector 
and year and correlational findings on their association 
with achievement in the two sectors over the years 
(Pearson correlations and significance). 

In general, the frequency of many school/class 
variables changed significantly in the two sectors over 
time. This is indicated by the significant interaction 
effect of sector*year on their frequency. These 
significant interactions, when coupled with changes 
in the association of these variables with achievement, 
provide us with some clues regarding their role in 
narrowing the achievement gap between the sectors. 



5 Proficiency score used in IEA studies. For more details, see the TIMSS 2003 technical report. 
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The increase over time in the percentage of 
mothers with an academic education (ACADEM_M) 
(positively associated with achievement) in Arabic- 
speaking schools and the decline in the number of 
people living at the student’s home (BGPLHOl) 
(negatively associated with achievement) in Arabic- 
speaking schools point to an improvement, over the 
years, of home conditions associated with achievement 
in the Arab sector. 

An increase in the frequency of several modes 
of mathematics instruction occurred in the Arabic- 
speaking schools, while in the Hebrew-speaking 
schools, their frequency either did not change or 
slightly decreased. This is also the case regarding 
the opportunities given to students to review their 
homework (CMHROH1), to work out problems on 
their own (CMHWPOl), and to relate what is learnt 
to daily life (CMHMDL1). Reviewing homework 
(CMHROH1), which was negatively associated with 
achievement in 1999, turned out, in 2003, to be 
positively and significantly associated with achievement 
in both sectors. 

Relating what is learnt to daily life (CMHMDL1) 
and working out problems on student’s own 
(CMHWPOl), which were negatively associated with 
mathematics achievement in 1999 in both sectors, 
continued unchanged in Hebrew-speaking schools, 
but became positively associated with achievement in 
2003 only, in the Arabic-speaking schools. 

Similarly, in science, from 1999 to 2003, students in 
Arabic-speaking schools had increased opportunities to 
practice inquiry-oriented modes of instruction (watch 
their teachers demonstrate experiments (CSDEMOl), 
conduct experiments on their own (CSEXPER1), and 
work in small groups on experiments or investigations 
(CSSGRP1)). These modes of instruction became 
less frequent over the years in the Jewish sector. In 
2003, they were all slightly, although not significantly, 
positively associated with achievement in the Arab 
sector. 

Changes that occurred from 1999 to 2003 in the 
frequency of conducting experiments in the Arabic- 
speaking schools illustrate this trend. While there 
was no change in the frequency with which students 
in Hebrew-speaking schools conducted experiments, 
this variable did become more frequent in the Arabic- 
speaking schools. The positive association this mode 
of instruction had in 1999 with science achievement 



in Hebrew-speaking schools disappeared in 2003, 
while in Arabic-speaking schools, it was the negative 
association of this mode of instruction with science 
achievement that disappeared. By 2003, frequently 
conducting experiments in the class was no longer 
improving Hebrew-speaking students’ achievement, 
and no longer adversely affecting student achievement 
in Arabic-speaking schools. 

The frequency of testing in mathematics 
(CMHHQTl) and science classes (CSTESTl) was 
similar in both sectors and remained constant over 
the years. In 1999, frequent testing was negatively 
associated with mathematics and science achievement, 
and it became more so in the Jewish sector in 2003. In 
the Arab sector, it became less negatively associated with 
science achievement, and even positively associated 
(albeit not statistically significantly) with mathematics 
achievement. Toward 2003, frequent testing seemed to 
undermine student achievement in Hebrew-speaking 
schools whereas it harmed students in Arabic-speaking 
schools less, or even slightly benefited them. 

Improvement in school conditions from 1999 to 
2003, as indicated by a decrease in the mean of the 
index of shortages that affect instruction (scale 1 — does 
not affect, 4 — affects a lot) occurred in both sectors, 
although more so in the Arab sector. In 1999, high 
levels of these indices (poor resources) when related 
to general school resources (RESOUR1) (especially in 
Arabic-speaking schools) or in mathematics resources 
(RESOUR2) (in both types of schools) were negatively 
associated with achievement. In 2003, this situation 
remained evident only in the Hebrew-speaking 
schools. In the Arabic-speaking schools, shortage of 
general school resources turned out, in 2003, not to be 
significantly associated with achievement, while high 
levels of shortage in mathematics or science resources 
even became positively associated with achievement. 
It seems that the improved conditions for science 
and mathematics learning in Arabic-speaking schools 
toward 2003 were not necessarily associated with 
improved achievement. 

Changes in two indicators of school climate that 
limit instruction, both negatively associated with 
achievement, occurred differentially in the two sectors. 
The severity of late arrivals at school (BCBGSPOl) 
(1 — not at all; 3 — very severe ) decreased from 1999 
to 2003 in the two sectors and especially in Arabic- 
speaking schools. Teachers’ assessment regarding 
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the Interaction Effect of Sector*Year on their Measures 



R. ZUZOVSKY: ACHIEVEMENT GAP BETWEEN HEBREW-SPEAKING AND ARABIC-SPEAKING SCHOOLS IN ISRAEL 



_0 

O 

O 

sz 

0 
0 
CD 
c 

03 

0 

Q. 

C/D 

1 

O 

!q 

03 



_C/D 

O 

o 

SZ 

o 

C/D 

CD 

C 

03 

0 

CL 

C/D 



£ 

-Q 

0 

I 



CO 

CO 

I 

CO 

CM 



CO 

o 

o 

CM 



GD V 
GD A 
GD g 



O 

O 



§ 



o 

O 



Q 

c/5 



o 

O 



Q 

CO, 



o 

O 



Q 

co 



ra c * 

m o o 

°a o u 

m M <u 

CD 0 co 

is £ 
rT o s 



_q 

03 



o 
c o 



.o 

.-C2 

co 

o 

E 

o 

o 

o 

-Q 

i 

C 

CO 



* 




* 

* 

* 


LO 


GD 


'M - 


CO 


o 


'sf 








i — 


CO 


CM 


i— 


M; 


"sf; 


' 




' 


o 






CM 


CO 


LO 




cd 


cd 


CO 


CM 


LO 




CO 


CM 








i — 


aT 


CM 








" — 






CM 






T— 


o 


q 




A 




* 




* 


* 




* 


* 




* 


1 


o 


co 


cp 


o 


q 










o' 




CM 


q 


LO 




’ 


’ 


o 






'si; 


q 


q 




LO 


'sf- 


* 


* 


* 


* 


-X 


■X 


* 


■X 


•X 


O 


'T 


CO 


q 


CO 


q 








T 


00 


CO 


CM 


q 


co 


" ' 


' 1 




CO 






'si; 


T— 


CO 




LO 




* 


* 

* 


* 


LO 


N- 


CO 


q 


T— 


"sf 


'E 


A 


cd 






=3 


0 


0 


"D 


SZ 


E 


0 


o 


o 




E 


sz 


0 






■qd 


d 


CO 


> 


"O 


0 


_0 


0 


Q. 






o 


Q. 


"d 


0 


0 


CO 


Q_ 


CO 


o 






< 


CD 


~6 




z 


CO 




o 


CD 


UJ 


I 


CO 


o 


_l 


LL 


< 


CL 


I 


o 


CD 


CD 


< 


CD 


O 



-X 








GD 


N- 


. 

CO 


o 


q 


CO 


co 






CM 


— 




q 


CO 


CO 


CO 




GD 




h- 


cd 


CM 


CM 


CM 




-x 


■X 


■X 




-X 


■X 


■X 


GD 


's|- 


CO 


G 




'sf 


'si; 




cm' 


LO 


CO 


00 


CO 


CO 


q 


q 


CO 


CO 


q 


q 


CM 


CM 


CM 


CM 




* 




■X 




■X 




* 


GD 


co 


N- 


T— 




'sf 




'sf_ 


EP 


CO 


CO 






q 


q 


CO 




co 


q 


q 


cd 


cd 


CM 


CM 






$ 


* 


co 


o 


CM 


00 


o 


q 


CO 




O) 


CM 




f3 


co 


co 


CO 


q 


CM 


CM 


r^ 


cq 


cd 


cd 


CM 


CM 


-X 








* 


•X 






CO 


b 


q 


CM 


cd 


cd 


CM 


CD 


co 


c 






q: 


£ 

o 






b 


c 






i 


o 

0 


0 


0 


E 

o 


E 

_0 


> 


0 

0 


sz 




"CO 




CD 

c 


2 


"O 

c 


o 




Q. 




0 


> 


0 




0 


0 


c 


N 


0 

DC 


> 

O 

CO 


b 

0 

u 


N 

Z5 

o 


i— 


T - 


T— 


T_ 


I 


o 


u 


| 


o 


CL 


Q 


o 


DC 


§ 




I 


I 


I 


3 


I 










O 


o 


o 


O 



LO 

o 



'sf 

CO 



h- 

N- 



"'t 

CM 



CO 



CO 

o 



CO 

CO 



0 

CD 

tf 

O 

sz 

co 

cE 

3 

O 

CO 

LU 

DC 



'sf 

CO 



CO 

N- 



CM 

CM 



GD 

CM 



O 

h- 



G) 

CM 



'sf 

N- 



'sf 

00 



o 

CM 



tr 

o 

JZ 

co 

CM 

DC 

3 

o 

co 

LU 

DC 



CO 

LO 



O 

LO 



CM 

N- 



CM 

CM 



CO 

CM 



CM 

LO 



00 

LO 



CM 

CM 



CO 

LO 



O 

o 

o 



o 

o 

sz 

o 

C/D 

0 

"co 



O 

CL 

CO 

CO 

CD 

o 



LO 

co 



co 

CM 



o 

CM 



CO 

CO 



CO 

CM 



CO 

CO 



CO 

CO 



LO 

CM 



O 

CO 



o 

CM 



CO 

"sf 



co CD 



0 

_q 



"D 

Z5 



0 

b 

co 

§ 

CD 



O 



o 

o 

VI 

A 



VI 

A 



VI 

CL 



§ 



29 



IEA CONFERENCE PROCEEDINGS 2006 



■s 

R 



£ 

a 

L> 



K 






§ 

R 

& 

s> 

ON 

’ — i 



~n 



R 

'Ju 

*<-4 

<-0 



R 

+2 

■3 









-R 

i 8 

R 



o » 



s 



a 



_0 

O 

O 

sz 

0 

( r. ) 

03 

c 

0 

CD 

CL 

C/D 

1 

o 

!d 

03 



a 

a 

+2 

& s 

.§ £ 
•+<k X 

+2 a 

s $ 

^ £ 

<L> C+/ 

•S > 

+* ■+-* 

Q R 

Sh § 

^ S 



_C/D 

O 

O 

SZ 

o 

0 

DD 

C 

ZS 

03 

CD 

Q. 

0 



£ 

-Q 

0 

H 



CO 

CO 

I 

CO 

CM 



I s - 

C\i 

I 

'sf- 

Cvl 



CO 

o 

o 

CM 



O 

O 



Q 

CO 



o 

o 



OD I I ,, 

q o Q 

03 O CO 



O C * 

w O o 
°3 o o 
C/) cc QD 
0 0 CO 



O 

O 



§ 



o 

O 



Q 

co 



£ 



C 03 
0 £ 



JO 

0 



O 

CO 



c 

o 

co 

0 

1 

o 

o 

£ 

o 

-Q 

c 

■§ 

CO 







■X 


X 






* 






CM 


T 


'sf 


q 


CM 


CO 








i — 


CO 


cd 


T— 


q 




' 


" ’ 




o 






C\J 


CO 


LO 




cd 


cd 


CO 


co 


co 


CM 


CM 


CM 




- 




i — 


OD 


cd 


T— 


'si; 


zT 


’ 




" 


CM 






T— 


o 


CNJ 




h-i 


zf 


X 






* 






* 






CO 


T 


03 


CO 




<9 










o' 




CM 


2 


2 








o 








CM 


o 




LO 




X 


x 




* 


X 






* 


■X 


03 


I s - 


CM 


CO 


co 


q 








i — 


co 


CO 


CM 


LO 


co 






" 


CO 






q 


1 — 


CO 




LO 


's f 




* 

■X 

x 


■X 


LO 


I s - 


CO 


I s - 


T— 


0 - 


■d 


|0 


cd 






=3 


0 


0 


"D 


sz 


E 


0 


o 


o 




E 


JZ 


C 3 






■0 


13 


0 


> 


"O 


0 


_0 


0 


CL 






o 


Cl 


"d 


0 


0 


0 


Q_ 


0 


o 






< 


6 


~6 




z 


CO 




o 


0 


LU 


I 


CO 


Q 


3 


LL 


< 


CL 


I 


o 


0 


0 


< 


CD 


o 



03 


I s - 


CM 


LO 


CO 


o 


o 




o 


o 


LO 


LO 




o 


o' 


"=t 




q 


q 


q 




CM 


co 


q 


q 


cd 


cd 


CM 


CM 


CM 


03 


o 


I s - 


co 


o 


q 


q 




o 


CM 


cd 


6T 


6T 






co 


co 


iq 


CO 


CO 


o 




00 


CO 


CO 


cd 


C\S 


CM 


CM 


CM 






















T— 


CO 


LO 


CO 


03 


o 


o 




o 


q 


o 


co 




o 


|3 


co 


2. 


I s - 


q 


q 


q 


I s - 


I s - 


q 


q 


cd 


CM 


CM 


CM 


CM 










X 










* 


03 


CM 


CO 


co 


CO 


o 


C\l 


o 


o 


CO 


o 


cd 


cd 


o' 


cd 


co 




q 


q 


q 


C\J 


1^ 


03 


I s - 


h- 


cd 


CM 


CM 


c\i 


CM 


* 


* 




X 




X 


X 




X 


X 




CVI 


CM 


CM 


q 


LO 


id 


q 


CM 


03 


CO 






CM 






03 




0 




0 

0 


_c 

c 


0 

C 

0 


Q_ 

03 


0 

0 


03 


0 


n 


— 


0 


^3 


_E 


x 


0 




0 

C 

o 

E 

0 


0 

CL 

X 

0 


0 

0 

0 

~o 

> 


E 

0 

_C 


o 

CD 

0 


"O 

0 

JZ 

o 

0 


0 

C 

0 

"O 

13 

CO 


2 

CL 

d 

Z3 

to 


o 

~6 

=3 

CO 


13 

O’ 

03 

_C 

§ 

T 


o 


E 

LU 


LU 

§ 


E 


_1_ 

q 




CL 


CO 


QC 


CO 


LU 


X 


o 


0 


LU 


Q 


LU 


CO 


CO 


1 — 


CO 


CO 


1 — 


CO 


CO 


o 


O 


o 


o 


o 



o 

o 



'sj- 

CO 



I s - 

CM 



I s - 

I s - 



^t 

c\i 



co 

CM 



O) 

CO 



o 

o 



CO 

CO 



LO 

CM 



CM 

CO 



CO 

CM 



I s - 

O 



CD 

I s - 



o 

co 



co 

o 



I s - 

I s - 



o 

o 



o 

CO 



o 

CM 



CO 

CO 



0 

03 

tf 

o 

JZ 

CO 

E 

3 

O 

CO 

LU 

QC 



O 

0 

(D 

o 

c 

cd 

o 

0 

tf 

o 

JZ 

CO 

CO 

CC 

3 

O 

CO 

LU 

CC 



LO 



o 

co 



CM 

CM 



CM 

LO 



I s - 

O 



I s - 

LO 



C\J 

CM 



00 

LO 



0 

> 



o 

o 



03 
0 
.5 
o 

o 
o 
-c 
o 

co CD 



o 

CL 

CO 

0 



CO 

CM 



LO 

CM 



CO 

LO 



CM 

CM 



CO 

CO 



03 

CO 



I s - 

(M 



CO 

CM 



CO 

CO 



O 

Q 

o 

VI 

A 



o 

§ 

JZ 

0 

jo 



cl 

=3 



CO 

O 

b 
0 
CO 
I — 

o 



o 

o 



§ 



30 



R. ZUZOVSKY: ACHIEVEMENT GAP BETWEEN HEBREW-SPEAKING AND ARABIC-SPEAKING SCHOOLS IN ISRAEL 



limits on teaching due to disruptive student behavior 
(CTMGLT06, CTSGLT06) (on a scale of 1 — not at 
all to 4 — a lot) increased toward 2003 in the Jewish 
sector, but there was no change in the Arab sector. These 
are signs of improvement in school climate indicators 
in the Arab sector. In addition, these variables also 
became, toward 2003, more negatively associated with 
achievement in the Hebrew-speaking schools, while 
less negatively associated with achievement in the 
Arabic-speaking schools. These changes also indicate 
that the negative aspects of school climate have less 
effect on the achievement of students in Arabic- 
speaking schools. 

School-level variables involved in a three-way 
interaction with sector and year 

School-level variables that changed in the two sectors 
in their frequency and/or association with achievement 
from 1 999 to 2003 were fitted along with sector and 
year into two-level HLM models. Tables 5 and 6 
summarize the results of these analyses. The models 
are those that exhibited statistically significant ( p < 
.05 or near) two- or three-way interactions between 
the relevant school-level variable and the other two 
school-level predictors. The models, in mathematics, 
contained variables describing instruction: students 
working in groups (ZCMHWSG1); reviewing 
homework (ZCMHROH1); relating what is learned 
to daily life (ZCMHMDL1); working out problems 
on their own (ZCMHWPOl); and having quizzes 
or tests (ZCMHHQTl). In the case of science, the 
models contained variables describing instruction: 
students conducting experiments (ZCSEXPER1) 
and providing explanations (ZTSCSWE). As all 
school-level variables employed in these analyses were 
standardized, their mean values were set to 0. Negative 
values of these variables indicate less frequent use 
of these modes of instruction while positive values 
indicate frequent use. 

Other variables included in the models describe 
the levels of effect that shortage in general school 
resources has on school capacity to provide instruction 
(ZRESOUR1); the levels of effect that shortage in 
mathematics resources (ZRESOUR2) or in science 
resources (ZRESOUR3) has on school capacity to 
provide instruction; severity of late arrivals at school 
as reported by principals (1 — not a problem-, 3 — a 
serious problem) (ZGSPOl); and “limit to teaching” 



due to disruptive student behavior as reported by 
mathematics teachers (ZTMGLT06), and as reported 
by science teachers (ZTSGLT06) (1 — does not limit 
teaching, 4 — limits teaching a lot). Here, too, the mean 
values of these variables were set to 0. Negative values 
on the standardized scale of these school variables 
indicate good school climate; positive values indicate 
bad school climate. 

The most important output of these analyses were 
estimates of the regression coefficients of all terms 
involved in the models, and their standard error of 
measurement. In the presence of interactions, the 
regression coefficients of first-order terms (i. e. , of sector, 
year, and the relevant school variables) do not represent 
“main effects” or constant effects across all values of the 
other variables. Rather, they represent the effects of the 
predictor at the mean of the other predictors (Aiken & 
West, 1991, pp. 38-39). As all school-level variables 
were standardized, the coefficients of the interaction 
terms indicate an increase or decrease in achievement 
due to change by one standard deviation on the scale 
of the school variable, above or below its mean value. 
The interpretation of these regression parameters and 
coefficients should be as follows: 

• The intercept in each of the models indicates the 
mean achievement attained by students in Arabic- 
speaking schools at the mean level of the relevant 
standardized school variables. 

• yoi represents additional achievement score points 
attained by students in Hebrew-speaking schools in 
1 999 at the mean level of the relevant standardized 
school variables, as compared to the achievement 
of students in Arabic-speaking schools (sector gap 
in favor of Hebrew-speaking schools). 

• Yo 2 represents additional achievement score points 
attained by Arabic-speaking schools in 2003 at 
the mean level of the standardized relevant school 
variables as compared to their achievement in 1 999 
(year gain in favor of Arabic-speaking schools). 

• Y 03 indicates an interaction effect between 
sector and year, and it represents the change in 
achievement score points from 1999 to 2003 of 
students in Hebrew-speaking schools at the mean 
level of the standardized relevant school variables 
compared to change in achievement score points 
of students in Arabic-speaking schools during this 
period under the same conditions (year gains of 
Hebrew-speaking schools compared to those of 
Arabic-speaking schools). 
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Notes: p s 0.07; *p <; 0.05; **p <;0.01 ; ***p <s0.001 . 

BSV = Between school variance; WSV = Within school variance. 
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• yo 4 indicates change in achievement score points 
of students in Arabic-speaking schools in 1999 
due to an increase of 1 SD above the standardized 
mean value of the relevant standardized school 
variables. 

• Y05 indicates an interaction effect of the relevant 
school variable and year, and it represents a change 
in achievement score points from 1999 to 2003 in 
the Arab sector due to an increase of 1 SD above 
the standardized mean value of the relevant school 
variables. 

• yo6 indicates an interaction effect of the relevant 
school variable and sector, and it represents 
additional score points that students in Hebrew- 
speaking school achieved in 1999 due to an 
increase of 1 SD above the standardized mean of 
the relevant school variable compared to the gains 
of students in Arabic-speaking schools under the 
same conditions. 

• yo7 indicates a three-level interaction effect of 
sector, year, and the relevant school variable, and 
represents change in achievement score points 
from 1999 to 2003 in the Hebrew-speaking sector 
versus that in the Arab sector due to an increase of 
1 SD above the mean of the standardized relevant 
school variable. 

Adopting these interpretations to the results 
presented in Tables 5 and 6 reveals some similarities. 

Sector and year (yoi Y02) were found to have 
significant positive effects in all analyses, pointing 
to higher achievement gains in 1999 in Hebrew- 
speaking schools than in Arabic-speaking schools at 
the mean of all standardized school variables employed 
in the analyses, and higher year achievement gains in 
2003 than in 1999 of Arabic-speaking schools at the 
standardized mean of all school variables. 

The significant negative two-way interaction effects 
oi sector and year (sector*year) (703) that were found in all 
models analyzed indicated lower gains in achievement 
from 1 999 to 2003 of students in the Hebrew-speaking 
schools compared to the gain of students in Arabic- 
speaking schools due to the changes that occurred in 
the relevant school-level variables during this period. 

The positive two-way interaction effects between 
the selected school variables and year (705) indicated an 
increase in the achievement from 1 999 to 2003 in the 



Arab sector due to an increase of 1 SD above the mean 
of the standardized school variable. 

The positive two-way interaction effect between the 
selected school variable and sector (706) represented the 
additional score points students in Hebrew-speaking 
schools achieved in 1999 due to an increase of 1 SD 
above the mean of the standardized school variables 
compared to gains of students in Arabic-speaking 
schools under the same conditions. 

All coefficients of the three-way interaction effect 
(707) in Tables 5 and 6 were found to have a negative 
effect, indicating a decrease in the achievement 
gains, 6 from 1999 to 2003, of students in Hebrew- 
speaking schools compared to gains of students in 
Arabic-speaking schools (higher year gains in the Arab 
sector). 

The significant three-way interaction terms 
found in the mathematics models were with testing 
(ZCMHHQTl) and, to a lesser significance level 
(p < .08), with relating what is learnt to daily life 
(ZCMHMDL1). Similarly, significant three-way 
interaction terms in the mathematics models were 
found with the indices of the effect on school capacity 
to provide instruction due to shortage in general school 
resources (ZRESOUR1), and in mathematics school 
resources (ZRESOUR2), as well as with variables 
describing the limit to teaching due to disruptive 
student behavior (ZTMGLTO6). 

In the science models, the school variables involved 
in significant three-way interaction with sector and year 
were the frequency with which students conducted 
experiments (ZCSEXPER1), how often their teachers 
asked them to provide explanations (ZTSCSWE), 
shortage in general school resources (ZRESOUR1), 
and the limit to teaching due to disruptive student 
behavior in science classes (ZTSGLTO6). An increase 
of 1 SD above the standardized mean of these variables 
resulted, in both subject areas, in lower gains from 1 999 
to 2003 in the achievement of students in Hebrew- 
speaking schools compared to the achievement gains 
of students in Arabic-speaking schools. 

Probing the three-way interaction terms 

Regressing science and mathematics scores by means 
of hierarchical models that contain beyond student- 
level variables, three second-level variables (i.e., sector, 
year, and one meaningful school contextual variable), 



6 Due to an increase of 1 SD above the mean value of the standardized selected school variables involved in the analyses. 
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as well as including all interactions amongst them 
allowed the computation of predicted outcomes in 
1 999 and 2003 for students in Arabic-speaking schools 
and for students in Hebrew-speaking schools. These 
computations are based on three values of the relevant 
school-level variables — maximal, mean, and minimal. 

The following tables present the predicted 
outcomes in mathematics (Table 7) and in science 
(Table 8). I have chosen to present findings from 
these simulations related to variables that describe 
instruction or school resources and learning climate, 
as these variables are relatively easy to manipulate and 
probing their effect can enable the formulation of 
policy recommendations. 

The predicted outcomes show, for both school 
subjects, that at the mean and the maximal values of 
the selected variables, student achievement gains from 
1 999 to 2003 (year gains) were much more profound in 
Arabic-speaking schools. As a result, the achievement 
gap between the sectors in favor of Hebrew-speaking 
schools ( sector gains) narrowed over time and even 
changed direction to be in favor of Arabic-speaking 
schools. 

In mathematics at the mean and maximal values 
of some instructional variables (testing, relating what 
is learnt to daily life, working out problems on one’s 
own, reviewing homework), the year gains of students 
in the Arabic-speaking schools were much higher than 
those of students in the Hebrew-speaking schools. As 
a result, the sector gap narrowed and, in the case of 
frequent testing and frequent relating what is learnt 
to daily life, Arabic-speaking students outperformed 
their peers in Hebrew-speaking schools. 

The same happened in science at the mean and 
maximal values of all instructional variables involved. 
The year gains of students in Arabic-speaking schools 
were higher than are those of their peers in Hebrew- 
speaking schools. This caused the sector gap to narrow. 
Here, too, in the case of frequent testing, Arabic- 
speaking students outperformed students in Hebrew- 
speaking schools. 

Predicted outcomes in both mathematics and 
science, due to levels of the effect that shortage in 
general school resources have on school capacity to 
provide instruction, exhibit the same pattern. At the 
maximal level of effect (schools with low resources), 
the year gains of Arabic-speaking students were higher 
and the sector gap narrowed and almost disappeared. 

At the low levels of effect, due to shortage in school 



resources (affluent school conditions), the year gains 
of students in Arabic-speaking schools were similar to 
those of their peers in Hebrew-speaking schools, and 
so the sector gap remained. 

At low levels of limit to instruction caused by 
disruptive student behavior (positive school climate), 
the year gains of the two sectors were similar and the 
sector gap in favor of students in Hebrew-speaking 
schools remained the same from 1999 to 2003. 
However, at the mean and maximal levels of limit to 
teaching due to disruptive student behavior, the year 
gains of students in Arabic-speaking schools were much 
bigger than the year gains of students in the Hebrew- 
speaking schools, and so the sector gap declined. The 
negative impact on achievement of disruptive student 
behavior appeared in Hebrew-speaking schools, but it 
did not affect the achievement of students in Arabic- 
speaking schools. 

Plotting these predicted outcomes for the two 
school subjects in three separate graphs dependent 
on the level of the relevant variable and on its 
association with achievement at the two points in 
time, for Hebrew-speaking schools and for Arabic- 
speaking schools, allows us to visualize the narrowing 
achievement gap between the sectors that occurred 
from 1999 to 2003. 

The plots chosen to illustrate the narrowing 
achievement gaps in mathematics are those related to 
models that contain variables describing instruction 
and school conditions that affect learning: students 
working out problems on their own (ZCMHWPOl); 
frequent relating of what is learnt to daily life 
(ZCMHMDL1); frequent testing (ZMHHQTl); 
shortage in general school resources which affects 
instruction (ZRESOUR1); and disruptive student 
behavior (ZTMGLT06) (Figures 1, 2, 3, 4, and 5). 
Changes in these variables represent either efforts made 
by the teachers in their classes or efforts made at the 
school level in line with some national interventions. 

In science, the plots are those related to models 
that contain variables describing the frequency 
of students providing explanations (ZTSCSWE), 
frequency of working in small groups on investigations 
(ZCSSGRP1), having quizzes or tests (ZCSTESTl), 
and that contain the variable that describes shortage 
in general school resources which affects instruction 
(ZRESOUR1) as well as the negative effect of students’ 
disruptive behavior (ZTSGLT06) (Figures 6, 7, 8, 9, 
and 10). 
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Figure 1: Predicted Mathematics Scores by Sector, Year, and by Students Working Out Problems on their Own 
(WPOlJ 




Figure 2: Predicted Mathematics Scores by Sector, Year, and by Students Having a Test or a Quiz ( QT1) 
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Figure 3: Predicted Mathematics Scores by Sector, Year, and by Students Relating What is Learnt to Daily Life 
(MDL1) 
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Figure 4: Predicted Mathematics Scores by Sector, Year, and by Effect of Shortage in General School Resources 
(ZRESOUR1) 
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Figure 5: Predicted Mathematics Scores by Sector, Year, and by Limit to Teaching due to Disruptive Student Behavior 
(ZTMGLT06) 
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Figure 7: Predicted Science Scores by Sector, Year, and by Students Working in Small Groups on Experiments and 
Investigations in Class ( GRP1) 
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Figure 9: Predicted Science Scores by Sector, Year, and by Effect of Shortage in General School Resources 
(ZRESOUR1) 



Low Effect of Shortage in School 
Resources (High Resources) 



Mean Effect of Shortage in School 
Resources (Mean Resources) 



High Effect of Shortage in School 
Resources (Low Resources) 






1999 


2003 




1999 


2003 




1999 


2003 


Jews 


486.91 


505.13 


Jews 


489.35 


493.28 


Jews 


495.06 


465.58 


Arabs 


416.56 


459.45 


Arabs 


344.21 


462.86 


Arabs 


370.52 


470.82 



39 



IEA CONFERENCE PROCEEDINGS 2006 



Figure 10: Predicted Science Scores by Sector, Year, and by Limit to Teaching due to Disruptive Student Behavior 
(ZTSGLT06) 



Low Levels of Limit Due to Student 
Disruptive Behavior (ZTSGLT06) 



Mean Levels of Limit Due to Student 
Disruptive Behavior (ZTSGLT06) 



High Levels of Limit Due to Student 
Disruptive Behavior (ZTSGLT06) 
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Discussion and conclusions 

This paper aimed to capture the dynamics that occurred 
in Israeli schools from 1999 to 2003, which eventually 
led to the narrowing of a persistent gap in achievement 
between students in Hebrew-speaking and Arabic- 
speaking schools. The trigger for this study was the 
relatively higher achievement gains in mathematics 
and science of students in Arabic-speaking schools 
detected in the TIMSS 2003 study in Israel. In view 
of continuing social and economic disparities between 
the two ethnic populations, and given the TIMSS 2003 
study, on the one hand, and special efforts made by the 
Ministry of Education during the 1 990s to improve 
schooling conditions in Arabic-speaking schools, on 
the other, our attention focused mainly on variables 
operating at the school/class level. 

In contrast to studies that measure effectiveness 
from data derived from a one-time survey, this study 
aimed to consider the effect of temporal changes in 
the frequency and/or the effectiveness of a selected 
set of school variables describing instruction, school 
resources, and school learning climate that were 
obtained at two points in time and were found to 
interact significantly with sector — ethnic affiliation of 
school — and year of study. 

The findings of this study revealed differential 
ethnicity-related changes in the frequency and 
effectiveness of instructional variables operating 
mostly at the class level and of variables describing 
material conditions and learning climate operating at 
the school level. 

Changes occurring in the frequency and 
effectiveness of certain instructional modes often 



reflect changing pedagogical fashions worldwide. In 
contrast, changes occurring in school-level variables 
that describe resources and learning climate tend to 
reflect policy interventions and deliberate national 
efforts invested in implementing them. 

The changes that occurred in the frequency and 
effectiveness of self-managed problem-solving in 
mathematics classes in both sectors, and the increased 
emphasis given to relating study material to daily 
life in Arabic-speaking schools, reflect a new trend 
in mathematics instruction and echoes a debate 
in this field regarding the appropriateness of the 
“conceptual” mode of mathematics teaching versus 
the “computational” mode of mathematics teaching 
(Desimone, Smith, Baker, & Ueno, 2005). Scholars 
who advocate conceptual teaching emphasize real- 
world problem-solving: working with problems that 
have no obvious solutions, discussing alternative 
hypotheses, and using investigations to solve 
problems (Hiebert et ah, 1996). Scholars in favor of a 
computational instructional practice focus on routine 
drill and practice (Li, 1999). Conceptual teaching has 
been regarded as more appropriate for high-performing 
students, while computational instruction might be 
seen as more appropriate for low-performing students. 
However, recent studies suggest that low-achieving 
students can master more demanding intellectual 
problems while simultaneously learning basic skills 
(Lo Cicero, De la Cruz, & Fuson, 1999; Mayer, 
1999), and that blending demanding academic work 
in computational instruction is, in fact, advantageous 
to underachievers (Knapp, 1995). 
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Findings in this study concerned independent 
problem-solving and relating what is learnt to daily 
life, which both fall into the category of conceptual 
teaching, together with other findings obtained in 
TIMSS 2003 in Israel concerning the effectiveness 
of computational activities (practicing four basic 
arithmetic operations, working on fractions and 
decimals, interpreting graphs and tables, and writing 
equations), which were more effective in Arabic- 
speaking schools (Zuzovsky, 2005). All of these go 
toward supporting the claim that this blend of practices 
will improve attainment of students in Arabic-speaking 
schools, which have been known as low performing. 

In science, differential changes occurred in the 
frequency and effectiveness of several practices that 
can be identified with an inquiry mode of science 
instruction. Students conducting experiments on their 
own, working in small groups on investigations, and 
providing explanations to observed phenomena are 
all practices in line with this instructional approach. 
These modes of instruction became, over time, more 
prevalent and more effective in the Arabic-speaking 
schools while they grew less so in the Hebrew-speaking 
ones. As in mathematics teaching, there is debate 
in science teaching as to whether inquiry-oriented 
instruction — now regarded as mainstream pedagogy — 
which encourages students to ask questions, find out 
answers on their own, and experiment on their own, 
is congruent with some cultural and home values that 
non-mainstream students bring from home (Au, 1980; 
Philips, 1983). On the other side of this debate are 
scholars who consider that replacing the mainstream 
pedagogy will deprive non-mainstream students of 
opportunities to learn academic skills such as inquiry 
skills and scientific knowledge (Fradd & Lee, 1999; 
Ladson-Billings, 1994, 1995; Lee, 2003; Lee & Luyks, 
2005; Lubienski, 2003). These scholars advocate 
explicit instruction to non-mainstream students in 
ways that reflect the dominant culture’s rules and norms 
for classroom participation and discourse. In line with 
the second suggestion, our study indicates a growth 
in the popularity of inquiry-oriented instruction in 
Arabic-speaking schools (non-mainstream students 
and teachers) that, indeed, turned out to be positively 
associated with achievement in this sector. 

To sum up, the higher achievement gains of the 
Arab sector seem to be a result of adopting mainstream 
pedagogy in both school subjects: the conceptual 



approach in mathematics and the inquiry-oriented 
approach in science. These more demanding modes of 
instruction did not exclude more traditional, still very 
effective, modes, such as the computational mode and 
listening to lectures, which provide students in Arabic- 
speaking schools with suitable instruction. 

The other findings obtained in this study relate to 
policy interventions carried out in Israel with the aim 
of reducing inequalities between the two sectors in 
school resources and infrastructure. These efforts were 
coupled with a growing demand for accountability, 
assessment, and monitoring outcomes. The success 
and impact of these policy interventions could be 
traced through changes that occurred in the variable 
that described the effect of shortage in school resources 
and the effect of testing. 

The decrease in the index describing the harmful 
effect of shortage in school resources on instruction, 
which occurred in both sectors, but more so in 
Arabic-speaking schools, points to the success of the 
five-year plan that aimed to improve the conditions 
of Arabic-speaking schools. However, it is worth 
noting that improved conditions do not guarantee 
their effective use. The findings of this study show that 
under maximal effect of shortage (low resources), the 
year gains of Arabic-speaking schools from 1999 to 
2003 were found to be higher than those of Hebrew- 
speaking schools, causing the achievement gap in 
favor of Hebrew-speaking schools to narrow. Under 
minimal effect of shortage (high resources), the year 
gains in the two sectors were almost the same, and the 
sector gap remained. 

Over the last fewyears, a national assessment project 
has operated in Grades 4, 6, and 8 in Israel. Each year, 
these classes in half of the schools in Israel are tested 
in the major school subjects, including mathematics 
and science. This testing, in addition to regular teacher 
tests, is a burden on many students and teachers. While 
the frequency of testing, mostly decided at a national 
level, is similar in the two sectors and did not change 
from 1999 to 2003, it became negatively associated 
with achievement in Hebrew-speaking schools while 
less negatively, and even positively, associated with 
achievement in Arabic-speaking schools. Frequent 
assessment is another example of policy with a 
differential effect that should be considered when 
planning accountability policy nationwide. 
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The narrowing school achievement gap between 
the Jewish sector and the Arab sector in Israel is, in the 
first place, a result of the effects teachers have in their 
classes due to adopting and adapting suitable modes 



Appendix 

School-level variables 

BSBGBOOK: Number of books in student’s home 
(1 — none or few to 5 — more than 200) 

BSBGHFSG: Highest level of education the student 
aspires to finish (1 — secondary to 5 — beyond first 
academic degree) 

ACADEM_F or ACADEM_M: Either parent with 
academic education (1) or lesser education (0) 
BGPLHOl: Number of people living in the student’s 
home (2 to 8 or 8 or more) 

School/ class-level variables 

sector: 0 — Arabic-speaking schools-, 1 — Hebrew-speaking 
schools 

year: 0—1999; 1—2003 

Student body composition variables 
CBGBOOK, CGHFSG, ACADEM-F, ACADEM- 
IC, BGPLHOl 

Variables describing the student body composition are 
standardized class aggregates or means of the above- 
described student variables. 

Variables describing mathematics instruction 
Class average of student responses regarding how often 
certain instructional activities take place (1 — never to 
4 — every lesson or almost every lesson) 

CMHWSG1: Working together in small groups 
CMHMDL1: Relating what is learnt to daily life 
CMHROH 1 : Reviewing homework 
CMHWPO 1 : Working out problems on their own 
CMHHQTl: Having a quiz or test 

Variables describing science instruction 

CSDEMOl: Watching the teacher demonstrate an 
experiment 

CSEXPER1: Conducting an experiment 
CSSGRP1: Working in small groups on 

experiments 

CSTESTl : Having a quiz or a test 



of instruction at the class level. At the national level, 
policy interventions, even if successfully implemented, 
do not always add to the higher achievement gains of 
students in the Arab sector. 



Another instructional variable was derived from 
teacher questionnaires (on a scale from 1 — never to 
4 — every or almost every lesson). It described how often 
students were asked to provide explanations in their 
class— TSCSWE. 

Indices describing shortage in resources that affect 
school capacity to provide instruction 
Indices built on the principal’s responses to a set of 
items describing shortage in “general” school resources 
(instructional materials, budget, buildings, heating/ 
cooling, space, equipment for handicapped) — 
RESOUR1; shortage in mathematics school resources 
(computers, computer software, calculators, library 
materials, audiovisual resources) — RESOUR2; 
shortage in science school resources (laboratory 
equipment, computers, software, calculators, library 
material, audiovisual resources) — RESOUR3. The 
scale of these variables has four categories ranging from 
1 — not affected due to shortage to 4 — greatly affected. 

School-level variables describing learning climate 
Only two variables were chosen as indicators for this 
issue: principal’s responses on the severity of late 
arrival at school (scale 1 — not a problem to 3 — serious 
problem) — LSPO 1 . 

Teacher responses regarding the extent 
disruptive students’ behaviors limit their teaching — 
CTMGLT06 for mathematics teachers or 
CTSGLT06 for science teachers (both on a scale of 
1 — not at all to 4 — a lot). 
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Abstract 

This paper investigates the relationship between eighth- 
grade students’ achievement and self-perceptions in 
mathematics and science by analyzing the three waves 
of the Trends in International Mathematics and Science 
Study (TIMSS) data. A total of three measures on self- 
perception were used, namely, how much students like 
the two subjects, their self-perceived competence in the 
subjects, and their perceived easiness of the subjects. For 
within-country data, with individual student as the unit 
of analysis, there is generally a positive correlation between 
students’ achievement and their self-perception. But when 
the three self-perception measures are aggregated at the 
country level, the relationship is reversed. In other words, 



there is a negative correlation between self-perceptions 
and achievement on a between-country analysis with 
country as the unit of analysis. This pattern is consistent 
in both mathematics and science across all three waves of 
data, even though the sample sizes (number of countries) 
and the participating countries vary from wave to wave. 
One possible explanation for this finding is that high- 
performing countries have higher academic standards; 
their students have higher pressure to get into top-choice 
academic institutions by excelling in public examinations. 
Accordingly, students from these countries have better 
academic performances in science and mathematics on 
the average, but lower preference for these subjects. 



Introduction 

There is a wide range of difference in students’ 
mathematics and science performance across 
countries in each wave of the Trends in International 
Mathematics and Science Study (TIMSS 1995, 1999, 
and 2003). As an illustration, for the eighth-grade 
students’ results in TIMSS 2003, Singapore’s average 
scores were as high as 605 and 578 for mathematics 
and science respectively, while South Africa’s were 264 
and 244 (Martin, Mullis, Gonzalez, & Chrostowski, 
2004; Mullis, Martin, Gonzalez, & Chrostowski, 
2004). Nonetheless, every school system is different 
with respect to its unique sociocultural and economic 
context, and this is why the task of explaining why 
there is so much cross-national variation in students’ 
achievement in the two subjects must be conducted 
with caution. Challenging as the task may be, this 
study examined the relationship between students’ 
achievement scores and their self-perception of the 
two subjects and explored some plausible explanations 
behind the vast cross-national variation in students’ 
performance. 



According to cognitive psychologists and motivation 
theorists (e.g., Bandura, 1994), students with positive 
attitudes toward learning and positive self-perceptions 
toward their competence level can lead to motivation, 
thereby enhancing students’ academic achievement. 
Many empirical studies have tested these assumptions 
and generally support this hypothesized continuous 
feedback loop between people’s self-evaluation, or 
self-efficacy beliefs, intrinsic interest, motivation, 
and accomplishment (Brown, Lent, & Larkin, 
1989; Locke & Latham, 1990; Multon, Brown, & 
Lent, 1991; Schunk, 1989, 1991; Zimmerman & 
Bandura, 1994; Zimmerman, Bandura, & Martinez- 
Pons, 1992). However, these studies and motivation 
theories are basically rooted in western culture and 
social conditions. Moreover, self-conceptions can vary 
from culture to culture. In order to avoid culturally 
biased results, it is wise to consider the relationship 
between students’ academic achievement and their self- 
perceptions cross-nationally by means of inspecting 
cross-national data. 
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Now that the TIMSS data are available, the 
hypothesized relationship can be tested out on a 
larger number of countries with different sociocultural 
and economic backgrounds. Seemingly, analysis at 
the country level can go beyond such psychological 
theories as motivation and self-efficacy. This is because 
many psychological theories operate at the individual 
level, not at the country level. We hoped that by 
analyzing aggregated data at the country level, we 
could garner insights with respect to explaining cross- 
national variation of students’ academic achievement 
that may not be apparent by analyzing the data at the 
individual level alone. 

Hypotheses 

In this study, we tested two groups of null hypotheses. 
The first group was based on within-country data (with 
the individual student as the unit of analysis): 

1. There is no correlation between students’ 

mathematics and science achievement scores 
and the extent of how much they like the two 
subjects. 

2. There is no correlation between students’ 

mathematics and science achievement scores and 
their self-evaluation of their competence in these 
two subjects. 

3. There is no correlation between students’ 

mathematics and science achievement scores and 
their perceived easiness of the two subjects. 

4. There is no correlation among the three measures 
of self-perception, namely, how much students 
like the two subjects, their self-evaluation of 
their competence in the two subjects, and their 
perceived easiness of the two subjects. 

The second group of null hypotheses was based 
on between-country data (with country as the unit of 
analysis): 

5. There is no correlation between students’ mean 
mathematics and science achievement scores and 
their average self-perception about how much 
they like the two subjects. 

6. There is no correlation between students’ mean 
mathematics and science achievement scores 
and their average self-evaluation of their level of 
competence in these two areas. 

7. There is no correlation between students’ mean 
mathematics and science achievement scores 
and their average perceived easiness of the two 
subjects. 



8. There is no correlation among students’ average 
self-perceptions about how much they like the 
two subjects, their average self-evaluation of their 
competence in the two subjects, and their average 
perceived easiness of the two subjects. 

Countries/school systems included in this 
study 

Forty countries/school systems participated in the 
TIMSS 1995 study at the eighth-grade level. In 
the second wave, only 38 countries/school systems 
participated in the TIMSS 1999 study at the eighth- 
grade level. Of these 38 countries, 26 participated in 
TIMSS 1995. In the third wave, 46 countries/school 
systems participated in the TIMSS 2003 study at 
the eighth-grade level, including 35 countries/school 
systems that participated in one or both of the TIMSS 
1995 and 1999 studies. 

A noticeable difference between TIMSS 1995 
(Beaton, Mullis, Martin, Gonzalez, Kelly, & Smith, 
1996) and TIMSS 1999 and 2003 was that quite a 
few of the developed European countries in the 1995 
study chose not to participate in the two later studies. 
These countries included Austria, Denmark, France, 
Germany, Greece, Iceland, Ireland, Portugal, and 
Switzerland. At the same time, a good many of the 
new participants joined in the later studies. Many 
of these are developing countries and regions, or in 
transition. In TIMSS 1999, the new participants 
included Chile, Chinese Taipei, Finland, Indonesia, 
Jordan, Macedonia, Malaysia, Moldova, Morocco, 
Philippines, Tunisia, and Turkey. In TIMSS 2003, 
the new participants included Armenia, Bahrain, 
Botswana, Chile, Egypt, Estonia, Ghana, Lebanon, 
Palestine, Saudi Arabia, Scotland, and Serbia and 
Montenegro. 

Data, measurement, and methods 

Using data from the three waves of TIMSS (1995, 
1999, and 2003), this study investigated the 
relationship between Grade 8 students’ mathematics 
and science achievement and three measures of their 
self-perceptions on these two subjects. For the TIMSS 
1995 and 1999 data, the first measure was the response 
to the statement, “I like mathematics (or science),” 
which was used as an indicator of self-perceived 
attitude toward the two subjects. Responses were based 
on a Likert scale, the four points of which were 1 = 
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dislike a lot, 2 = dislike, 3 = like, and 4 = like a lot. As 
for the TIMSS 2003 data, the corresponding question 
was slightly changed to “I enjoy learning mathematics 
(or science).” The second measure was the response 
to the statement, “I usually do well in mathematics 
(or science),” which was used as an indicator of self- 
efficacy or self-perceived competence in mathematics 
and science. The third measure of self-perception was, 
for both the TIMSS 1995 and 1999 data, the response 
to the statement, “Mathematics (or science) is an easy 
subject,” and was used as a proxy of self-perceived 
rigor of the subjects. As for the TIMSS 2003 data, 
this measure was modified to the statement, “I learn 
things quickly in mathematics (or science).” So far as 
the coding is concerned, the latter two measures of 
self-perception for the TIMSS 1995 and 1999 data 
were based on a four-point Likert scale and were both 
coded as 1 = strongly disagree, 2 = disagree, 3 = agree, 
and 4 = strongly agree. For the TIMSS 2003 data, 
all three measures of self-perceptions used the same 
four-point Likert scale, which, after recoding, was 1 
= disagree a lot, 2 = disagree a little, 3 = agree a little, 
and 4 = agree a lot. We believe, nevertheless, that even 
though the questions for two of the three measures 
of self-perceptions and the response categories were 
somewhat modified throughout the three waves of the 
study, the underlying theoretical constructs remained 
basically the same. 

It should be pointed out that, in some countries, 
Grade 8 science is taught as an integrated subject, 
whereas in others, it is taught separately as several 
science subjects, including physics, chemistry, biology, 
earth science, and environmental science. Thus, in 
the TIMSS study, two versions of students’ science 
background questionnaires were prepared. While 
one version asked questions with respect to science 
being taught as an integrated subject, the other asked 
questions with respect to science being taught as 
several separate areas. Both versions contained the 
three items on self-perception in science as mentioned 
earlier. Students only responded to the version of 
questionnaire that matched the way science was taught 
in their schools. For the version with science being 
taught as an integrated subject, a mean for each student 
was computed across as many of the science areas as 
were taught in the school that he or she attended. In 
this way, a single variable with a value in the range of 
1 to 4 was used for each measure of self-perception, 



regardless of which version of the questionnaires the 
students filled out. 

We are well aware of the limitations of using the 
three measures as indicators of the three concepts — 
self-perceived attitudes toward the two subjects, self- 
perceived competence or self-efficacy in the two subjects, 
and self-perceived rigor of the subjects. Scales based 
on multiple items can provide more reliable measures 
of the concepts, but the TIMSS student background 
questionnaire in 1995 unfortunately did not ask a 
series of questions with which we could have developed 
scales to measure the three concepts. For TIMSS 2003, 
the International Study Center at Boston College did 
develop several indexes measuring students’ confidence 
in their ability to learn mathematics and science, and 
students’ value on mathematics and science (Martin 
et ah, 2004; Mullis et ah, 2004). Flowever, in order 
to maintain consistency across the three waves of the 
study, we simply used a single item as an indicator for 
each concept as described above. 

We also recognize that many factors may influence 
how students respond to the statements mentioned 
above, including their academic goals and aspirations, 
their parents’ and/or teachers’ expectations, academic 
standards, the rigor of curriculum, etc. Yet by using 
aggregate data to measure such concepts as self- 
perceived attitude toward the two subjects, self- 
efficacy, and self-perceived rigor of mathematics and 
science for each country, we move the unit of analysis 
from the student level to the country level. What is 
special about the TIMSS study is that it encompasses 
countries with very different cultural backgrounds, 
as well as different social, historical, and political 
backgrounds. Besides, teaching and learning are 
cultural activities (Kawanaka, Stigler, & Fliebert, 
1999). Thus, under these circumstances, the meaning 
of the measures will be further affected by the different 
social and cultural contexts of the participatingTIMSS 
countries. Moreover, examination of the data collected 
in the TIMSS curriculum analysis reveals substantial 
differences cross-nationally (Schmidt, McKnight, 
Yalverde, Flouang, & Wiley, 1997; Schmidt, Raizen, 
Britton, Bianchi, & Wolfe, 1997). All these cross- 
national differences need to be taken into account 
when examining the relationship between students’ 
achievement and the three self-perception measures 
used in the present study. For example, “strongly agree” 
does not necessarily mean exactly the same thing in 
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different languages and cultural contexts. Therefore, 
caution must be taken when drawing conclusions 
from cross-national comparisons based on these items. 
Nonetheless, these constraints should not preclude us 
from utilizing the data from this unprecedented large- 
scale study to test the possible relationship between 
students’ achievement and the three self-perception 
concepts, provided that caution is exercised in terms 
of over-generalizing any findings. 

The methodology of this study is relatively 
straightforward. For each wave of the TIMSS data, we 
performed two sets of correlation analyses. The first 
set of analyses, to test the first group of hypotheses, 
was the correlation analysis for within-country data at 
the individual student level. We examined, separately 
for each of the participating countries, the correlation 
between students’ mathematics and science test scores 
and the three measures of their self-perception: how 
much they liked or enjoyed the two subjects, their 
self-perceived competence in the subjects, and their 
perceived easiness of the subjects. To test the second 
set of hypotheses, we performed correlation analyses 
at the country level (between-country analyses), 
with country as the unit of analysis. We analyzed 
the aggregate data for both test scores (country-level 
achievement scores) and the averages of self-perception 
measures at the country level. 

Results of the analyses 

The results of the analyses are reported in the order of 
the eight hypotheses stated earlier. We anticipated that 
because of the large sample size for each participating 
country, the correlation coefficients would almost 
certainly be significant. Therefore, the significance 
level is not reported for within-country data analysis. 
Readers are advised to look at the magnitude of 
the correlation coefficients as measures of the 
corresponding effect sizes. As regards the correlation 
coefficients for between-country data analyses, the 
significance level and sample size for each coefficient is 
reported simultaneously because the sample size is not 
large under this setting. 

To save space, we include here only the Pearson 
correlation matrix of students’ achievement scores 
on mathematics and science and the three measures 
of self-perception of TIMSS 2003 data (see Table 1). 
Those readers who are interested in the within-country 
correlation matrix for TIMSS 1995 and 1999 data 
should refer to Shen (2002) and Shen and Pedulla 



(2000). Even though the number of countries or 
school systems varies from wave to wave, the general 
pattern still holds. 

Columns 1 and 2 ofTable 1 report, for each of the 
46 school systems, the correlation coefficients between 
students’ achievement scores in the two subjects and 
their responses to how much they enjoyed the two 
subjects. To reiterate, for TIMSS 1995 and 1999, the 
corresponding question inquired how much students 
“liked” the two subjects rather than enjoyed them. 
As shown in the table, there is, within each country, 
a positive relationship between students’ actual 
score and their enjoyment of the two subjects, with 
only one exception — Indonesia. For most countries, 
the correlation coefficients fall between .10 and 
.40, fluctuating from country to country and from 
mathematics to science. This indicates that within 
each participating country, students who reported 
enjoying or liking mathematics and science tended to 
have higher achievement in these areas than students 
who reported less enjoyment or liking of the two 
subjects. Even though the strength of relationship 
is not particularly strong, this result supports the 
conventional motivation theory discussed earlier. 
Thus, hypothesis 1 is rejected, and it is concluded that 
there is some evidence in support of the alternative 
hypothesis. It should be pointed out that since the 
sample design ofTIMSS is a two-stage stratified cluster 
sample design, a jackknife procedure was used in this 
paper to take into account the clustering effect when 
the correlation coefficients and their standard errors 
were computed for each country. 

Columns 3 and 4 ofTable 1 present the Pearson 
correlation coefficients for students’ achievement 
scores (for both mathematics and science) and their 
self-perceived competence in the two subjects. 
Again, there is, as shown, a positive relationship for 
all countries except for Indonesia. On average, the 
magnitudes of the coefficients are greater than those 
shown in Columns 1 and 2. For several school systems, 
the correlation coefficients are as high as .50 (Chinese 
Taipei, Korea, and Norway). Flence the strength of 
the relationship between the two variables range from 
low to medium effect sizes. These statistics indicate 
that, within each participating country, students 
who reported doing well in mathematics and science 
tended to have higher achievement in these areas than 
students who reported doing less well. This result 



48 



C. SHEN & H. P. TAM: RELATIONSHIP BETWEEN STUDENTS’ ACHIEVEMENT AND SELF-PERCEPTION 



Table 1: Correlations between Achievement Scores of Mathematics and Science and Three Measures of Selfperception 
for Grade 8 Students in 46 School Systems Based on TIMSS 2003 Data (in alphabetical order) 



Country 
(school system) 


“1 enjoy 
math” 


“1 enjoy 
science” 


“1 do well 
in math” 


“1 do well 
in science” 


“1 learn 
math quickly” 


“1 learn 

science quickly” 


Armenia 


0.166 


0.106 


0.173 


0.145 


0.210 


0.169 


Australia 


0.222 


0.192 


0.395 


0.274 


0.391 


0.249 


Bahrain 


0.156 


0.055 


0.259 


0.121 


0.316 


0.161 


Belgium (Flemish) 


0.180 


0.085 


0.126 


0.150 


0.196 


0.177 


Botswana 


0.203 


0.296 


0.113 


0.101 


0.055 


0.122 


Bulgaria 


0.180 


0.089 


0.319 


0.163 


0.265 


0.154 


Chile 


0.056 


0.018 


0.263 


0.102 


0.276 


0.127 


Chinese Taipei 


0.462 


0.274 


0.513 


0.333 


0.452 


0.274 


Cyprus 


0.304 


0.145 


0.468 


0.211 


0.420 


0.208 


Egypt 


0.114 


0.203 


0.146 


0.136 


0.146 


0.167 


England 


0.098 


0.195 


0.263 


0.297 


0.241 


0.272 


Estonia 


0.175 


0.053 


0.440 


0.198 


0.425 


0.168 


Ghana 


0.219 


0.331 


0.100 


0.214 


0.085 


0.227 


Hong Kong SAR 


0.315 


0.262 


0.305 


0.196 


0.310 


0.190 


Hungary 


0.250 


0.094 


0.452 


0.218 


0.475 


0.217 


Indonesia 


0.030 


-0.071 


-0.122 


-0.229 


-0.055 


-0.104 


Iran 


0.154 


0.065 


0.305 


0.203 


0.298 


0.167 


Israel 


0.055 


0.119 


0.300 


0.284 


0.302 


0.282 


Italy 


0.319 


0.127 


0.435 


0.259 


0.429 


0.212 


Japan 


0.310 


0.257 


0.470 


0.403 


0.385 


0.291 


Jordan 


0.150 


0.089 


0.225 


0.115 


0.258 


0.193 


Korea, Rep. of 


0.397 


0.294 


0.565 


0.438 


0.478 


0.345 


Latvia 


0.237 


0.099 


0.440 


0.232 


0.403 


0.182 


Lebanon 


0.255 


0.153 


0.300 


0.177 


0.310 


0.198 


Lithuania 


0.235 


0.086 


0.410 


0.183 


0.350 


0.189 


Macedonia 


0.072 


0.059 


0.206 


0.128 


0.204 


0.130 


Malaysia 


0.276 


0.214 


0.400 


0.207 


0.278 


0.164 


Moldova 


0.192 


0.097 


0.238 


0.167 


0.260 


0.156 


Morocco 


0.088 


0.057 


0.182 


0.112 


0.136 


0.110 


Netherlands 


0.042 


0.045 


0.227 


0.155 


0.263 


0.168 


New Zealand 


0.115 


0.145 


0.388 


0.283 


0.372 


0.276 


Norway 


0.272 


0.170 


0.505 


0.288 


0.474 


0.216 


Palestine 


0.196 


0.161 


0.289 


0.261 


0.294 


0.244 


Philippines 


0.161 


0.211 


0.076 


0.050 


0.067 


0.116 


Romania 


0.245 


0.106 


0.408 


0.193 


0.346 


0.150 


Russian Federation 


0.266 


0.098 


0.445 


0.270 


0.414 


0.217 


Saudi Arabia 


0.057 


0.104 


0.214 


0.144 


0.193 


0.176 


Scotland 


0.058 


0.226 


0.329 


0.402 


0.279 


0.389 


Serbia & Montenegro 


0.232 


0.030 


0.455 


0.147 


0.504 


0.171 


Singapore 


0.275 


0.221 


0.333 


0.208 


0.342 


0.221 


Slovak Republic 


0.207 


0.067 


0.382 


0.195 


0.443 


0.198 


Slovenia 


0.152 


0.072 


0.398 


0.216 


0.430 


0.198 


South Africa 


0.009 


0.053 


0.060 


0.013 


0.058 


0.056 


Sweden 


0.164 


0.152 


0.446 


0.204 


0.405 


0.238 


Tunisia 


0.215 


0.092 


0.250 


0.132 


0.294 


0.189 


United States 


0.129 


0.144 


0.376 


0.266 


0.328 


0.241 
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supports the self-efficacy theory discussed earlier and 
the conclusion reached by many prior research studies 
(Schunk, 1989; Zimmerman et al., 1992). Thus, 
hypothesis 2 is rejected, and it is concluded that there 
is evidence in support of the alternative hypothesis. 

Columns 5 and 6 of Table 1 present the Pearson 
correlation coefficients for students’ achievement 
scores (for both mathematics and science) and how 
quickly they learned the two subjects. For the TIMSS 
1995 and 1999 data, the corresponding question was 
how easy they perceived the two subjects to be. The 
pattern and the magnitude of the Pearson’s correlation 
coefficients are similar to those in Columns 1 to 4. We 
can therefore say that, for almost all countries, students 
who thought they learned quickly usually performed 
better in TIMSS than those who thought otherwise. 
Again, hypothesis 3 is rejected, and it is concluded 
that there is evidence in support of the alternative 
hypothesis. 

For the TIMSS 2003 data with 46 participating 
school systems, the only country with a negative 
correlation between students’ achievement and the 
three self-perception measures is Indonesia. For 
within-country data, throughout the three waves there 
is generally a positive relationship between students’ 
achievement and the three measures of their self- 
perception. The general pattern of a positive association 
between students’ achievement scores and their self- 
perception is consistent with conventional wisdom 
and supports existing psychological and motivation 
theories. 

By and large, the relationship among the three 
measures of self-perception for TIMSS participating 
school systems/countries is stronger and more 
consistent than the relationship between students’ 
achievement scores and their self-perception measures. 
Due to the similarity of the general pattern throughout 
the waves, and in order to save space, only four 
countries’ correlation coefficients among the three 
measures of self-perception from TIMSS 2003 are 
reported in Table 2. The countries selected are Chile, 
Japan, Morocco, and the United States. They represent 
a wide spectrum of difference in performance levels 
and in cultural and geographical characteristics as 
well. While Japan is a high-performing country, the 
United States is at the middle range, and Chile and 
Morocco are relatively low-performing countries, all 
with different cultural and geographical backgrounds. 



As shown in the table, the findings across the 
four countries are consistent. Despite the diverse 
achievement levels and variation in cultural and 
geographical factors, there is a clear positive relationship 
among the three measures of self-perception within 
each country, indicating that students who enjoyed or 
liked the subjects also perceived themselves as doing 
well, and thought that they learned the two subjects 
quickly or perceived the two subjects as easy. Thus, 
hypothesis 4 is rejected, and it is concluded that there 
is evidence in support of the alternative hypothesis. 

Findings from the within-country data analysis 
support the conventional wisdom about the relationship 
of students’ academic achievement and their attitudes 
toward the subjects, their self-perceived competence, 
and their perceived rigor of the two subjects. 

The next phase of analysis concerns the aggregate 
data level, that is, country or school system level. Here, 
we investigated if the pattern found from the within- 
country analysis would still hold for the cross-national 
analyses. For all three waves of TIMSS data, the 
first step was to compute the mean mathematics and 
science achievement scores together with the means of 
the three measures of self-perception for each country. 
The correlation coefficients for the relevant pairs of 
variables were then computed at the country level. 
As mentioned earlier, each student responded to one 
of the two versions of the background questionnaire. 
For those students where science was taught as several 
separate subjects, a mean score for each student was 
computed across as many science subject areas as there 
were data. 

Tables 3 and 4 present the correlation analysis 
results based on the aggregate data from TIMSS 1995 
for each country’s achievement scores in the two areas 
and the three measures of self-perception at the country 
level. It is interesting to note that the relationships are 
negative between the self-perception measures and 
the achievement scores in mathematics and science. 
Besides, the magnitudes of the correlation coefficients 
are moderately strong. The negative correlation stands 
in sharp contrast to the pattern of positive correlations 
found within the majority of countries as discussed 
above. The general pattern is that those countries where 
most students said they did not like mathematics and 
science, thought they usually did not do well in the 
two subjects, and perceived the subjects as difficult 
were usually high-performing countries, and vice versa. 
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Table 2: Correlations of Three 


Measures of Selfperception for Selected Countries (TIMSS 2003) 






Chile ( N = 6,286, using listwise deletion) 






(N = 6,269, using listwise deletion) 










1 


2 


3 




i 


2 


3 


1 . “1 enjoy learning math” 


1.000 






1 . ‘‘1 enjoy learning science” 


1.000 






2. “1 do well in math” 


0.461 


1.000 




2. ‘‘1 do well in science” 


0.476 


1.000 




3. “1 learn math quickly” 


0.482 


0.566 


1.000 


3. ‘‘1 learn science quickly" 


0.506 


0.538 


1.000 


Japan (N = 4,699, using listwise deletion) 






(N = 4,781 , using listwise deletion) 










1 


2 


3 




1 


2 


3 


1 . “1 enjoy learning math” 


1.000 






1 . ‘‘1 enjoy learning science” 


1.000 






2. “1 do well in math” 


0.415 


1.000 




2. ‘‘1 do well in science” 


0.448 


1.000 




3. “1 learn math quickly” 


0.448 


0.541 


1.000 


3. ”1 learn science quickly” 


0.506 


0.560 


1.000 


Morocco ( N = 2,448, using listwise deletion) 






(N = 2,527, using listwise deletion) 










1 


2 


3 




1 


2 


3 


1 . “1 enjoy learning math” 


1.000 






1 . ‘‘1 enjoy learning science” 


1.000 






2. “1 do well in math" 


0.400 


1.000 




2. ‘‘1 do well in science” 


0.319 


1.000 




3. “1 learn math quickly" 


0.448 


0.498 


1.000 


3. ‘‘1 learn science quickly” 


0.358 


0.441 


1.000 


United States ( N =8,592, using listwise deletion) 




(N = 8,556, using listwise deletion) 










1 


2 


3 




1 


2 


3 


1 . “1 enjoy learning math” 


1.000 






1 . ‘‘1 enjoy learning science” 


1.000 






2. “1 do well in math” 


0.514 


1.000 




2. ‘‘1 do well in science” 


0.563 


1.000 




3. “1 learn math quickly” 


0.518 


0.661 


1.000 


3. “1 learn science quickly” 


0.581 


0.655 


1.000 



We also note that the three self-perception measures 
correlate positively with one another. The TIMSS 
1995 study also included data from Grades 3, 4, and 
7. Although the number of participating countries was 
just under 40, the general pattern remained the same. 
To save space, we have not included them here. 

Tables 5 and 6 present the correlation analysis results 
based on the aggregate data from TIMSS 1 999 for each 
country’s achievement in the two areas and the same 
three measures of self-perception at the country level. 
Thirty-eight countries (school systems) participated 
in the TIMSS 1999 study (Grade 8 students in most 
countries). Because no data were reported for the 
Netherlands on the extent to which their students 
liked the subjects, the sample size for some correlation 
coefficients dropped to 37. By inspecting the 
coefficients of Tables 5 and 6, a very similar pattern 
is found that corresponds to those shown in Tables 3 
and 4 from the TIMSS 1995 data. The magnitudes of 
the correlation coefficients are slightly larger for the 
TIMSS 1 999 data, which probably relates to the greater 
variation of the achievement scores that resulted from 
some European countries not participating in and 



more developing countries joining the 1999 study. 

The TIMSS 2003 study had the largest number 
of participating countries in the history of the cross- 
national educational study. Tables 7 and 8 present 
the correlation analysis results based on the TIMSS 
2003 aggregate data for each country’s achievement in 
the two areas and the three similar measures of self- 
perception at the country level (Grade 8 students in 
most countries). As can be seen in these tables, the 
sizes of many coefficients are fairly strong. 

To facilitate a visual interpretation of the data, 
we further provided six scatterplots to illustrate the 
relationship between the achievement scores and the 
three self-perception measures in the two subjects for 
the 46 participating school systems. 

Figure 1 is the scatterplot of mathematics 
achievement scores versus the eighth grade students’ 
responses to “I enjoy learning mathematics” for the 
TIMSS 2003 participating countries. The Pearson 
correlation coefficient amounted to a high r = -.708, 
p <.001, N = 46. The figure shows that the few top 
mathematics-performing school systems (upper-left 
hand corner of the figure), such as Chinese Taipei, Hong 
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Table 3: Correlations between International Mathematics Achievement Scores and Three Measures of Self-perception 
for Grade 8 Students at the Country Level (N = 40 Countries/School Systems; TIMSS 1995 Data) 





1 


2 


3 


4 


1 . Mathematics score 


1.00 








2. “1 like math” 


-.44** 


1.00 






3. “1 usually do well in math” 


-.56** 


.45** 


1.00 




4. “Math is an easy subject” 


-.63** 


.59** 


.53** 


1.00 


Note: **p<0.0l. 



Table 4: Correlations between International Science Achievement Scores and Three Measures of Self perception for 
Grade 8 students at the Country Level (N = 40 Countries/School Systems; TIMSS 1995 Data) 







1 


2 


3 


4 


1 . 


Science score 


1.00 








2. 


“1 like science” 


-.41** 


1.00 






3. 


“1 usually do well in science” 


-.37* 


.37* 


1.00 




4. 


“Science is an easy subject” 


-.62** 


.56** 


.61** 


1.00 



Note: *p< 0.05; **p< 0.01. 



Table 5: Correlations between International Mathematics Achievement Scores and Three Measures of Self perception 
for Grade 8 Students at the Country Level (TIMSS 1999 Data) 







1 


2 


3 


4 


1 . 


Mathematics score 


1.00 








2. 


“1 like math” 


-0.68 ** 


1.00 










37 








3. 


“1 usually do well in math” 


-0.60 ** 


0.61 ** 


1.00 








38 


37 






4. 


“Math is an easy subject” 


-0.72 ** 


0.87 ** 


0.65 ** 


1.00 






38 


37 


38 





Notes: **p< 0.01. 

The number below the correlation coefficient is the number of school systems (countries). 



Table 6: Correlations between International Science Achievement Scores and Three Measures of Self perception for 
Grade 8 Students at the Country Level (TIMSS 1999 Data) 







1 


2 


3 


4 


1 . 


Science score 


1.00 








2. 


“1 like science” 


-0.56 ** 


1.00 










37 








3. 


“1 usually do well in science” 


-0.44 ** 


0.61 ** 


1.00 








38 


37 






4. 


“Science is an easy subject” 


-0.74 ** 


0.73 ** 


0.71 ** 


1.00 






38 


37 


38 





Notes: **p< 0.01. 

The number below the correlation coefficient is number of school systems (countries). 
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Table 7: Correlations between International Mathematics Achievement Scores and Three Measures of Self-perception 
at the Country Level (N = 46 Countries/ School Systems ; TIMSS 2003 Data) 





1 


2 


3 


4 


1 . Mathematics scoreq 


1.00 








2. “1 enjoy math” 


-0.71 ** 


1.00 






3. “1 do well in math” 


-0.64** 


0.53** 


1.00 




4. “1 learn math quickly” 


-0.70** 


0.67** 


0.87** 


1.00 


Note: **p < 0.01. 



Table 8: Correlations between International Science Achievement Scores and Three Measures of Self perception at the 
Country Level (N = 46 Countries/School Systems; TIMSS 2003 Data) 







1 


2 


3 


4 


1 . 


Science score 


1.00 








2. 


“1 enjoy science” 


-0.71 ** 


1.00 






3. 


“1 do well in science” 


-0.65** 


0.53** 


1.00 




4. 


“1 learn science quickly” 


-0.74** 


0.77** 


0.94** 


1.00 



Note: **p < 0.01. 



Kong SAR, Japan, Korea, and the Netherlands, have a 
relatively low level of enjoyment of the subject, while 
low-performing countries, such as Botswana, Egypt, 
Ghana, Morocco, and South Africa, generally have a 
high level of enjoyment in learning mathematics. 

Figure 2 is the scatterplot of science achievement 
scores versus the eighth-grade students’ responses to 
the statement “I enjoy learning science” for the TIMSS 
2003 participating countries. The Pearson correlation 
coefficient also amounted to a high r = -.706, p <.001 ,N 
= 46. Again, students in the top-performing countries, 
which included Chinese Taipei, Hungary, Korea, 
Netherlands, and Slovenia and Montenegro, generally 
indicated a low level of enjoyment in learning science, 
while students in the low-performing countries, 
including Botswana, Ghana, the Philippines, and 
South Africa, had the highest level of enjoyment in 
learning science. 

Figure 3 is the scatterplot of mathematics 
achievement versus “I usually do well in mathematics” 
for countries that participated in the TIMSS 2003 
Grade 8 study. The corresponding Pearson correlation 
coefficient was r= -.643, p <.001, N= 46. Students in 
the top-performing school systems, such as Chinese 
Taipei, Korea, and Japan, reported the lowest level of 
self-efficacy, while students in the bottom-performing- 
countries, such as Ghana, Saudi Arabia, and South 
Africa, reported a relatively higher level of self-efficacy. 



In Figure 4, the scatterplot of science achievement 
versus “I usually do well in science” also demonstrates a 
similar pattern. The corresponding Pearson correlation 
coefficient was r = - 0.648, p <.001, N= 46. 

Figure 5 is the scatterplot of mathematics 
achievement versus “I learn things quickly in 
mathematics.” The corresponding Pearson correlation 
coefficient amounted to r = -.696, p <.001, N= 46. As 
mentioned earlier, we used this measure as a proxy of 
the perceived rigor of the program. As reflected in the 
figure, Chinese Taipei, Hong Kong SAR, Japan, and 
Korea are high-performing school systems but their 
students were those most likely to feel that they did 
not learn things quickly in mathematics. In contrast, 
the students from a number of low-performing 
countries (lower-right hand corner of the figure) were 
those most likely to think they learned things quickly 
in mathematics. Figure 6 has a very similar pattern to 
that of Figure 5. The corresponding Pearson correlation 
coefficient is a high r = - 0.737, p <.001, N= 46. 

Notice that the correlation coefficients among the 
three aggregate measures of self-perception in Tables 
7 and 8 are all positive and are moderate to strong 
in terms of magnitude. Based on the summary results 
from Tables 3 through 8 and the six scatterplots, we 
reject the null hypotheses 5 to 8 and conclude that 
there is some evidence in support of the respective 
alternative hypotheses. 
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Figure 1: Scatterplot of Mathematics Achievement versus “I Enjoy Mathematics” for Grade 8 TIMSS 2003 Participating 
Countries 
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Note: Pearson’s correlation = -0.708 (p < 0.001 , N = 46). 



Figure 2: Scatterplot of Science Achievement versus “I Enjoy Science” for Grade 8 TIMSS 2003 Participating 
Countries 




’1 enjoy science’ 

Note: Pearson’s correlation = - 0.706 (p < 0.001 , N = 46). 
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Figure 3: Scatterplot of Mathematics Achievement versus “I Usually Do Well in Mathematics” for Grade 8 TIMSS 
2003 Participating Countries 
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Note: Pearson’s correlation = - 0.643 (p < 0.001 , N = 46). 



Figure 4: Scatterplot of Science Achievement versus “I Usually Do Well in Science” for Grade 8 TIMSS 2003 
Participating Countries 




Note: Pearson’s correlation = - 0.648 (p < 0.001 , N = 46). 
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Figure 5: Scatterplot of Mathematics Achievement versus “I Learn Things Quickly in Math” for Grade 8 TIMSS 2003 
Participating Countries 
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Note: Pearson’s correlation = - 0.696 (p < 0.001 , N = 46). 

Figure 6: Scatterplot of Science Achievement versus “I Learn Things Quickly in Science” for Grade 8 TLMSS 2003 
Participating Countries 




Note: Pearson’s correlation = - 0.737 (p < 0.001 , N = 46). 



56 



C. SHEN & H. P. TAM: RELATIONSHIP BETWEEN STUDENTS’ ACHIEVEMENT AND SELF-PERCEPTION 



In summary, for between-country analyses with 
country as the unit of analysis, there is a negative 
relationship between each self-perception measure and 
each achievement score. These findings were consistent 
for both mathematics and science across all three 
waves of TIMSS data, even though the sample sizes 
(number of countries) and the participating countries 
varied from wave to wave. 

Discussion and conclusion 

The existing motivation and self-efficacy theories 
suggest that there is a positive feedback loop among 
students’ academic achievement, their self-evaluation, 
and their intrinsic interest in the subjects; the results 
found in the within-country analyses support this. 
This is, however, in sharp contrast to the consistent 
finding that negative association between students’ 
achievement and self-perception exists at the country 
level across the three waves of TIMSS data. The two 
opposite patterns jointly form an interesting and 
paradoxical phenomenon, for which there are no 
ready theories and easy explanations. Of course, one 
should not interpret the findings from the study as 
encouraging students to develop negative attitudes 
toward mathematics and science and to decrease 
their self-perceived competence in order to raise their 
achievement. One cannot interpret causal implication 
out of correlational information; that would be 
committing an ecological fallacy. The negative 
relationship is found at the country level, not at the 
individual level. Besides, one should be careful with 
the interpretation at the country level. By the same 
token, the negative correlations found in the between- 
country analyses do not contradict the existing 
motivation and self-efficacy theories. These theories, 
as mentioned earlier, operate at the individual level, 
not at the country or culture level. The aggregate 
measures of students’ self-perceptions represent overall 
information and are different from characteristics at 
the individual level. They reflect a specific country’s 
educational, social, and cultural contexts, which 
contribute toward shaping the attitudes, values, and 
beliefs of some individuals in that country. 

As mentioned at the beginning of the paper, 
it is widely assumed that a positive self-regard is an 
important motivating force that helps to enhance 
people’s achievement. However, some researchers 
argue, with reference to cross-cultural studies, that 



the need for self-regard is culturally diverse and that 
the perception of oneself and regard for oneself differ 
across cultures. For example, Heine, Lehman, Markus, 
and Kitayama (1999) observed that anthropological, 
sociological, and psychological analyses revealed many 
elements of Japanese culture that are incongruent with 
such motivations. Instead, a self-critical focus is more 
characteristic of the Japanese, and that the need for 
positive self-regard is rooted more significantly in the 
North American culture. The results from this study 
also suggest that students in East Asian countries share 
something in common in terms ofself-perceptions. This 
common ground may perhaps be attributable to their 
sharing basically the same Confucian root. Similarly, 
other research has found that East Asian people are, 
due to cultural reasons, more likely than people from 
other cultures to “reduce” themselves in relation to 
other people (see, for example, Stigler, Smith, & Mao, 
1985; Uttal, Lummis, & Stevenson, 1988). However, 
given this special background of students from East 
Asian countries, we found that removing them from 
our analysis (not presented in this paper) changed only 
slightly the magnitude of the correlation coefficients, 
but did not change the overall pattern. 

The fact that consistent negative correlations were 
found at one level but positive correlations were found 
at the other level among the same three measures 
of self-perception is reason enough to justify the 
search for a more coherent and holistic explanation. 
Shen and Pedulla (2000) put forward a plausible 
explanation for the negative correlations between 
students’ achievement in mathematics and science and 
their sense of self-efficacy together with their perceived 
easiness of the two subjects. They suggested that low- 
performing countries might have relatively lower 
academic demands and expectations, whereas high- 
performing countries might have higher academic 
demands and expectations. In particular, the aggregate 
measure of students’ perceived easiness of mathematics 
and science may reflect the corresponding strength of 
the curriculum of that country. On the other hand, 
countries with a demanding curriculum and high 
academic standards in mathematics and science may 
turn out students with high academic achievement 
levels. 

Since the comprehensive analyses by Schmidt, 
McKnight, Valverde, Houang, and Wiley (1997) and 
Schmidt, Raizen, Britton, Bianchi, and Wolfe (1997) 
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on the curricula from countries participating in the 
TIMSS study did not provide a list of rankings in 
accordance to the strength of the curricula, it is as 
yet impossible to verify the explanation suggested by 
Shen and Pedulla (2000). However, their proposed 
explanation is consistent with the findings from a 
number of small-scale comparative studies in the past 
(cf. Becker, Sawada, & Shimizu, 1999; Stevenson & 
Stigler, 1992; Stigler et ah, 1985). It is also consistent 
with the findings from the TIMSS videotape study 
(Kawanaka et al., 1999), which examined eighth-grade 
mathematics classrooms in Germany, Japan, and the 
United States. The convergent findings from the three 
waves ofTIMSS data furnish further evidence in favor 
of the academic strength explanation. 

We suggest that the negative correlations are, to 
a certain extent, indicative of the overall relationship 
between the rigor of the academic standards and 
expectations on the achievement of students in 
mathematics and science. Students from such high- 
performing countries as Japan and South Korea usually 
indicate a relatively low level of enjoyment or liking of 
the two subjects and a lower level of self-evaluation, 
and they perceive the two subjects as hard and not 
easily or quickly learned. Conversely, students from 
low-performing countries, such as South Africa and 
Morocco, tend to indicate that they enjoy learning 
the two subjects, do fairly well in them, and consider 
the subjects as easy and ones in which they can learn 
things quickly. For some middle school students, high 
academic expectations or standards may stimulate 
their intrinsic interest in learning, but, for many 
others, demanding standards and rigorous curriculum 
may lead to resentment toward the two subjects. In 
countries where the expectation is low, the students, 
unlike their high-performing foreign counterparts, 
might have less motivation and set lower goals to 
improve their performance since they perceive their 
performance to be fairly acceptable already. If they 
believe that they are doing well and that mathematics 
and science are easy for them, they would see no need 
to study harder in these areas or to invest greater effort 
in them. 

We believe that the policy implication from this 
study and similar previous studies points to the benefit 
of gradually raising the academic standards and 



expectation in countries, including the United States, 
where performance is, relatively speaking, mediocre or 
dissatisfactory. However, a country’s specific historical, 
sociocultural, and economic environment affects and 
even constrains its academic standards and curriculum. 
Therefore, we do not imagine the achievement 
problem can be solved by simply copying a rigorous 
curriculum. An understanding of the prevalent beliefs 
and attitudes with respect to education in a specific 
society and culture is deemed necessary in order for 
such a policy to be beneficial and effective. 

When we examine the six scatterplots, we find that 
although all the negative correlations are statistically 
significant, there may be trends other than the linear 
one. Although the several East Asian school systems 
did well in TIMSS, and their aggregate level of self- 
perceptions were relatively low, there was also variation 
within this group. For example, students in Japan and 
South Korea typically had the lowest aggregate level 
of enjoyment of the two subjects and the lowest level 
of self-evaluation. Furthermore, they both perceived 
the two subjects as being difficult. In comparison, the 
average level of self-perceptions by the top-performing 
Singaporean students was more positive than those of 
the Japanese and South Korean students. Singapore’s 
school system might therefore be a better model for 
other school systems to follow than those of Japan and 
South Korea. An investigation of how middle school 
students in Singapore develop such a relatively positive 
attitude toward the two subjects and confidence in 
their ability despite the rigor of the curricula is beyond 
the scope of this study, but it is worth undertaking. 
Last, but not least, we should point out that the 
variation as evident in this study suggests that there is 
limitation in using academic standards alone to explain 
the negative correlation between achievement and self- 
perception at the country level. For a full explanation 
of this paradoxical relationship, we will need to 
examine other cultural and social factors as well. For 
example, in recent years, many countries around the 
world have taken on various reform efforts and policy 
changes in education. Hence it will be interesting to 
see if the patterns reported here will again show up in 
the TIMSS 2007 study. Should there be any changes; 
one can then investigate to see if such changes can be 
attributed to the reform efforts undertaken. 
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Abstract 

The Trends in Mathematics and Science Study (TIMSS) 
is the largest and most ambitious study undertaken by 
the International Association for the Evaluation of 
Educational Achievement (IEA). TIMSS provides a 
tool for investigating student achievement and school 
effectiveness, taking into account the varying influences 
of instructional contexts and practices and home 
environment. Schools vary widely in terms of the average 
achievement of their students in mathematics. Thus, it is 
of great interest for policymakers worldwide to identify 
factors that distinguish higher performing schools from 
lower performing schools. The aim of the analysis was to 
find indicators related to schools that differentiate between 
these two groups of schools. For this study, a more effective 
school was one where the school achievement score was 



higher than the score that would be predicted from the 
student characteristics. Data were obtained from 3,116 
students, a number that represented 31.8% of the entire 
population (9,786). Analysis of the differences between 
the predicted and achieved scores led to identification of 
schools that performed better than would be expected 
given the home circumstances of their students. From 
this analysis, six factors were found to account for school 
differences that relate to mathematics achievement. The 
factor that accounted for the greatest differences between 
the more effective and less effective schools was passive 
learning, while the second factor was active learning. The 
third related to self-perception, and the fourth factor was 
student attitudes toward mathematics. The remaining 
two factors were family incentives and class climate. 



Introduction 

Mathematical skills are critical to the economic progress 
of a technologically based society, which is why many 
countries question what their school-age populations 
know and can do in mathematics. More specifically, 
they want to know what concepts students understand, 
how well they can apply their knowledge to problem- 
solving situations, and whether they can communicate 
their understanding. Of even greater importance is 
their desire to know what they can do to improve 
students’ understanding of mathematical concepts, 
their ability to solve problems, and their attitudes 
toward learning (Beaton, Mullis, Martin, Gonzales, 
Kelly, & Smith, 1996). Mathematics achievement is 
a significant factor in decisions concerning placement, 
promotion, and selection in almost all education 
systems (Nasser & Birenbaum, 2005), and its 
importance is confirmed by the number of countries 
that participate in international mathematics studies 
like those conducted by the International Association 
for the Evaluation of Educational Achievement (IEA) 
and the Organisation for Economic Co-operation 



and Development (OECD). The findings of these 
international studies, as well as national surveys, are 
valuable tools for educators and policymakers (Grobler, 
Grobler & Esterhuyse, 2001; Nasser & Birenbaum, 
2005; Secada, 1992). 

The Trends in International Mathematics and 
Science Study (TIMSS) is one of the most ambitious 
series of studies undertaken by the IEA. TIMSS 
provides a tool to investigate both student achievement 
and school effectiveness, taking into account the 
varying influences of instructional contexts, practices, 
and home environment. The study’s global focus and 
its comparative perspective give educators valuable 
insight into what is possible beyond the confines of 
their national borders. Data from TIMSS make it 
possible to examine differences in current levels of 
performance in relation to a wide variety of variables 
associated with the classroom, school, and national 
contexts within which education takes place. Because 
the IEA studies present objective information on 
student performance from different countries and 
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cultures, international and national policymakers 
and educators are provided with an important data 
source. Data from IEA studies provide solid evidence 
for the feasibility and efficacy of educational policies, 
curriculum, and teaching practices (Mullis et al., 
2000). Most international studies on education focus 
on students’ academic outcomes, and the main reason 
for this is societal demand for academic achievement 
(Gadeyne, Ghesquiere, & Onghena, 2006). Such 
studies also give direction to policymakers who want 
to identify the characteristics of schools so they can 
more effectively plan improvement strategies (Brown, 
Duffield, & Riddell, 1995). 

School effectiveness research has flourished since 
1979, and has attracted considerable political support 
in several countries (Luyten, Yisscher, & Witziers, 
2005). Studies on school effectiveness identify school 
characteristics that optimize particular learning 
outcomes, school improvement factors, and processes 
that establish these effectiveness-enhancing factors 
(Scheerens & Demeuse, 2005). This kind of research 
aims to tease out the factors that contribute to effective 
education and especially those that schools can 
implement (Creemers & Reezigt, 2005). Research on 
school effectiveness pinpoints those characteristics or 
factors that are important for effectiveness at different 
levels of the system (i.e., student, learning, teaching, 
and school), and identifies school achievement in 
relation to basic cognitive skills. School effectiveness 
research also highlights the characteristics of schools 
and classrooms that are associated with differences 
in school effectiveness. If we know the particular 
characteristics of an effective school, especially 
those relating to the sphere of features that could 
be changed, then we are in a position to improve 
underperforming schools by encouraging them to 
adopt those characteristics (Luyten et al., 2005). 
Another objective of school effectiveness research is 
to increase the potential that schools have to improve 
education and especially educational achievement. In 
other words, school effectiveness research aims to find 
out what works in education and why (Creemers & 
Reezigt, 2005). 

Many researchers have focused on the fact that 
the composition of the student body has a substantial 
impact on achievement over and beyond the effects 
associated with students’ individual abilities and social 
class. Other researchers support the claim that schools 
with low social-class intakes have certain disadvantages 



associated with their context (Baumert, Stanat, & 
Watermann, 2005; Opdenakker & Van Damme, 
2005; Van de Grift & Houtveen, 2006; Wilms, 
1992). Other studies argue that the factors that most 
influence performance are the teaching and learning 
process and the creation of a learning environment. 
They argue that schools with high achievement are 
characterized by clear, well-organized teaching that 
motivates students and connects to their background 
knowledge and that keeps students actively involved 
in the learning process and their lessons — lessons 
that are efficiently organized and well structured. 
Recent research reveals that the main problem of 
underperforming schools is that their students are not 
given sufficient opportunity to attain the minimum 
objectives of the curriculum. Van de Grift and 
Houtveen (2006), for example, point to mathematics 
textbooks that are not suitable for attaining the basic 
objectives of the curriculum, insufficient time allotted 
for learning and teaching, and teaching that is poor and 
does not stimulate students. These two researchers also 
found that student performance in underperforming 
schools improved when the teaching was improved, 
the class was better organized, and the students were 
kept actively involved. 

A study by Stoll and Wikeley (1998) indicates 
that school improvement efforts in recent years have 
increasingly focused on effectiveness issues such as 
the teaching and learning processes and student 
outcomes. This focus on school improvement has led 
to more research into the factors that make a school 
effective (MacBeath & Mortimore, 200 1 ; Reynolds & 
Stoll, 1996). For school improvement to be successful, 
certain characteristics of the school atmosphere 
must be favorable. For example, a school and its 
students must have common goals and the school 
must feel responsible for its students’ success. Other 
requirements are mutual respect and support and a 
positive attitude toward learning (Creemers & Reezigt, 
2005). Behavioral theorists agree that schools will not 
change if the staff within the schools — the teaching 
staff especially — do not change. Three mechanisms 
that can bring about change are evaluation, feedback, 
and reinforcement. These mechanisms explain and can 
therefore be used to improve effective instruction in 
the classroom (Creemers, 1994). 

Schools vary in terms of their students’ average 
achievement in mathematics. In general, the student 
intakes of schools produce differences in outcome 
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that are not caused by school processes. For this 
reason, it is necessary, before comparing schools, to 
correct for student intake. Factors considered relevant 
in this respect are the socioeconomic status and the 
educational background of the students’ families. 
School performance is usually expressed in terms 
of average student achievement by school. These 
measures ideally include adjustments for such 
student characteristics as entry-level achievement 
and socioeconomic status in order to determine the 
value added by a school. Their main goal is to identify 
the factors that lead to the best results (Luyten et al., 
2005). The student populations of schools can differ 
considerably in the proportion of students from homes 
with particular characteristics. The extent to which and 
how the home situation affects educational achievement 
has received much attention (Papanastasiou, 2000, 
2002; Schreiber, 2002). Moreover, the extent to which 
schools vary in effectiveness and the school factors 
that seem to promote effectiveness are academically 
interesting. 

Figure 1 presents a simple model that clearly 
illustrates how mathematics performance is influenced 
indirectly by intake input and directly by the school 
environment. The educational background of the 
family, the size of a student’s home library, and the 
socioeconomic status of the family are three factors 
that could be included in the school intakes category. 
These factors are assumed to have a direct impact on 
processes within the school as well as on the general 
performance of the school. 

With its basis in school effectiveness research, this 
present study investigated achievement in schools in 
relation to the factors that enhance school performance. 
The main research question is: Why do students at some 
schools learn much less than would be expected on the 
basis of their family background. ? The aim of the study 
was to find out whether a set of indicators from the 
student TIMSS questionnaire for Grade 8 of the lower 
secondary school was responsible for differences in 
academic achievement. In other words, we tried to 
find out if specific characteristics are associated with 
students’ academic achievement. 



Figure 1: Influence of Student Intake and School 
Environment on Mathematics Performance 




Method 

The study focused on the TIMSS 1999 sample, the 
population of which included all students enrolled in 
Grade 8 in the 1998/99 school year. The participating 
students completed questionnaires on home and school 
experiences related to learning mathematics, and school 
administrators and teachers answered questionnaires 
regarding instructional practices (Beaton et al., 
1996). In Cyprus, all 61 gymnasia participated in this 
project (the entire population of schools), with two 
Grade 8 classes from each school. Within each class, 
all students were tested, and achievement tests and 
other data were obtained. Data were obtained from 
3,116 students, which represented 3 1.8% of the entire 
population (9,786). Flowever, among those students, 
the responses of only those who had completed all the 
questions were used. This led to listwise deletion of 
some subjects from the data set. The average age of 
students tested was 13.8 years. 

The study analyzed data from the student 
questionnaire and the mathematics tests to find 
certain school indicators that differentiate between 
more effective and less effective schools. For this study, 
a more effective school was designated as one where 
the school achievement score was higher than the 
mean score predicted from the student characteristics 
(Postlethwaite & Ross, 1992). In the same way, a less 
effective school was one for which the school mean in 
mathematics was lower than the mean expected. Based 
on the differences between the predicted scores and the 
actual scores, the residuals that distinguish the more 
effective from the less effective schools were identified. 
Analysis of the differences between the predicted and 
the achieved scores in terms of school quality led to 
identification of schools that performed better than 
would be expected given the home circumstances 
of their students. In total, seven steps were followed 
during identification of the factors distinguishing the 
more from the less effective schools (Postlethwaite & 
Ross, 1992). 

Step I: This first step involved identifying the 
measures related to home characteristics, given these 
are thought to affect student achievement. Three 
factors were identified from the TIMSS student 
questionnaire: the economic status of the family, the 
educational background of the family, and the size 
of the home library. For the first factor, 13 measures 
related to the economic status of the family, six of which 
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concerned items or facilities in the students’ homes: 
central heating, washing machine, air-conditioning, 
more than two cars, computer, and more than four 
bedrooms. The second factor related to the size of the 
library at home. More specifically, students were asked 
about the number of books in their homes, excluding 
magazines and school books. The third factor was the 
highest education level of their parents. 

Step 2: A regression analysis was run in which the 
dependent variable was mathematics achievement, 
and the independent variables were the three above- 
mentioned factors (parents’ educational background, 
the size of the home library, and economic status). The 
students placed above the regression line were those 
students with mathematics scores higher than would 
be expected. The students placed below the regression 
line were those who achieved lower scores than would 
be expected. 

Step 3: In the third step, the residuals scores were 
calculated. By residuals, we mean the differences 
between the actual scores and the predicted scores. 
Students with positive residuals were those students 
whose achievement was higher than would be expected, 
and students with negative residuals were those whose 
achievement was lower than would be expected. The 
residuals scores of the students were then averaged for 
all schools. The schools with positive mean residuals 
were considered the more effective schools and the 
schools with the negative mean residuals were deemed 
the less effective schools. 

Step 4: In this step, the schools were ranked from 
the most effective to the least effective school. The 
schools with average residuals >+0.10 and the schools 
with average residuals < -0 .10 were then selected. 
Our purpose was to select the schools at the extremes, 
as we considered these schools would give us a more 
reliable account of the factors determining school 
effectiveness. 

Step 5: During this fifth step, we tried to choose 
the indicators that educational authorities have under 
their control and that influence student achievement. 
We used the following criteria to select the indicators 
for further analysis: (i) we accepted those where 
correlations of the residuals with all indicators were 
statistically significant; and (ii) we excluded those 
variables which were not related to mathematics. 

Step 6: These criteria allowed us to identify 26 
variables, which we then grouped in seven categories: 



passive learning, active learning, self-perception, 
attitudes, family incentives, class climate, and external 
incentives. 

Step 7: For this step, we calculated ^-scores and 
used the t-test for the final analysis. The reason for 
calculating the z-score was to place all indicators 
on the same scale to facilitate interpretation of the 
difference in mean scores between the more and the 
less effective schools. We then summed up the values 
of the indicators to produce a composite value, and 
standardized the values of each of the seven factors to a 
mean of zero and a standard deviation of one. Finally, 
we used the f-test to calculate the mean differences 
between the more effective and the less effective 
schools. 

Results 

This study aimed to determine the factors that 
distinguish schools as more effective or as less effective, 
specifically in mathematics. From Table 1, we can see 
that the composite measures of economic status of 
the family and educational background, as well as the 
variable “home library,” correlated with achievement 
in mathematics. All three correlations were positive. 
The highest correlation was between educational 
background and achievement (r e ducation_math = 0.35), 
followed by the size of the home library (r iibrary_math 
= 0.25) and then economic status (r economic.math = 
0.21). These correlations indicate that the higher the 
educational background of the family, the larger the 
size of the family’s home library, and the higher the 
economic status of the family, the more likely it is 
that the son or daughter will have a higher level of 
achievement in mathematics. 

Table 2 shows the regression equation of the three 
composites as independent factors, and mathematics 
achievement as the dependent variable. The regression 
analysis was based on the hypothesis that mathematics 
achievement is a function of parents’ education, size of 
the home library, and the economic status of the family. 
We can see from the equation that the most significant 
factor in predicting mathematics achievement was the 
educational background of the parents. For all three 
independent factors, the contribution of variance 
to the prediction of mathematics achievement was 
statistically significant, although not high ( R = 0.377, 
R 2 = 0.142). 
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Table 1: Correlations between Mathematics Achievement and Parents Educational Background, Size of Elome Library, 
and Family’s Economic Status 



r 


Educational background 


Size of library 


Economic status 


Mathematics achievement 


0.35* 


0.25* 


0.21* 



Note: *p < 0.000. 



Table 2: Regression Equation for Predicting Mathematics Achievement 



Predict. Math Achiev. = ,28(education background) +.12(size of library) + ,07(economic status) 

R =0.377 
R 2 = 0.142 



Figure 2 presents the position of the 61 schools 
based on their achievement and their average 
residuals. This graph allows us to compare schools 
that were more effective than we might have supposed 
from their students’ mathematics achievement. For 
example, although school 25 had about the same 
average achievement ( X= 449) as school 57 [X= 451), 
school 25 was more effective than 57. School 25 had 
a positive average residual (+11) and school 57 had a 



negative average residual (-21). Furthermore, although 
school 25 had a lower mathematics achievement (X 
= 449) compared with school 55 (X= 496), it was a 
more effective school. The average residual for school 
55 was negative (-6). 

Figure 3 presents the schools after the exclusion of 
schools with small average positive or negative residuals. 
More specifically, the schools with average residuals > 
+ 0.10 and the schools with average residuals < -0.10 



Figure 2: Position of Schools Based on the Average Mathematics Achievement and on the Average Residuals of their 
Students 
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Figure 3: The Remaining Schools for Further Statistical Analysis 
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were excluded. In total, 33 schools out of 61 were 
retained, 16 of which had positive residuals and 17 of 
which had negative residuals. 

Table 3 presents the seven categories of indicators 
that were selected for further analysis, and the 
corresponding indicators and r values between 
indicators and residuals. The significance of r was the 
criterion for selecting indicators. The seven composites 
were used to distinguish the more from the less effective 
schools. Table 4 presents the factors, the r-test values, 
the significance level, and the mean differences for the 
two groups of schools — the more effective and the less 
effective. 

Our analysis revealed six factors explaining school 
differences in mathematics achievement. The most 
influential factor was passive learning. By passive 
learning, we mean that the teacher shows students how 
to solve mathematics problems, students copy notes 
from the board, the teacher uses the board during 
teaching, and the teacher uses an overhead projector 
during teaching. The second factor was active learning. 
Active learning is the opposite side of passive learning. 
In active learning, students work on mathematics 
projects, they use things from everyday life in solving 
mathematics problems, they work together in pairs or 
small groups, and they try to solve examples related to 



a new topic. When we look at these two factors, the 
most important among the seven, we see they are two 
sides of the same coin. For both factors — active and 
passive learning — the probability level was negligible, 
which led us to believe that these two factors are what 
really make the difference between the two groups. 

The third factor that distinguished the two groups 
of schools was selfperception. Self-perception has 
been defined as individuals’ beliefs regarding their 
performance capabilities in a particular context, or on 
a specific task or domain (Bandura, 1997). The beliefs 
that were included in this factor were the students’ 
beliefs that mathematics is difficult, mathematics is 
not one of their strengths, mathematics is not an easy 
subject, and that they are not talented in mathematics. 
This factor also had a strong influence on differentiating 
the two groups. 

The next relevant factor that distinguishes was the 
attitudes of students toward mathematics. Positive 
attitudes were signified by the students saying they 
enj oyed learning mathematics, they liked mathematics, 
and they would like a job involving mathematics. 
Students with positive attitudes were a characteristic 
of the group of effective schools. 

The remaining two factors distinguishing the 
more effective and the less effective schools were 
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Table 3: Composites, Indicators, and r Values with Residuals 



Factors 


Indicators 


r* 


1 . Self-perception 


1 . 


1 would like mathematics much more if it were not so difficult 


.293 




2. 


Mathematics is more difficult for me than for many of my classmates 


.473 




3. 


1 am just not talented in mathematics 


.399 




4. 


When 1 do not understand a new topic in mathematics initially, 
1 know that 1 will never understand it 


.254 




5. 


Mathematics is not one of my strengths 


.369 




6. 


Mathematics is not an easy subject 


.168 


2. Attitudes 


7. 


1 enjoy learning mathematics 


.181 




8. 


Mathematics is not boring 


.241 




9. 


1 would like a job that involved using mathematics 


.180 




10. 


1 like mathematics 


.295 


3. External incentives 


1 need to do well in mathematics: 






11. 


To get the job 1 want 


.052 




12 . 


To please my parents 


.233 




13. 


To get into the school of my choice 


-.106 




14. 


To please myself 


-.049 


4. Passive learning 


15. 


The teacher shows us how to do mathematics problems 


.066 




16. 


We copy notes from the board 


.109 




17. 


The teacher uses the board 


.157 




18. 


The teacher uses an overhead projector 


.225 


5. Active learning 


19. 


We work on mathematics projects 


.312 




20. 


We use things from everyday life in solving math problems 


.069 




21. 


We work together in pairs or small groups 


.106 




22. 


We try to solve examples related to new topic 


.253 


6. Family incentives 


23. 


Mother thinks it is important to do well in mathematics 


.161 




24. 


1 think it is important to do well in mathematics 


-.174 


7. Class climate 


25. 


In my mathematics class, students are orderly and quiet 


.107 




26. 


In my mathematics class, students do exactly as the teacher says 


.105 



Note: *p < 0.05. 



Table 4: Rank Order of Factors Distinguishing the More from the Less Effective Schools 



Factors 


f-test 


P 


Mean differences of z-scores 


Passive learning 


7.70 


0.00 


0.35 


Active learning 


7.41 


0.00 


0.33 


Self-perception 


4.95 


0.00 


0.23 


Attitudes 


2.61 


0.01 


0.12 


Family incentives 


2.00 


0.045 


0.09 


Class climate 


1.96 


0.05 


0.09 


External incentives 


1.31 


0.19 


0.06 
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family incentives and school climate, with the levels 
of significance at 0.045 and 0.05, respectively. These 
levels reveal the differences between the more and less 
effective school as marginal. The only factor that did 
not show a statistically significant difference between 
the more and the less effective schools was external 
incentives. Here, the r-test value was small (? = 1.31) 
and the probability level high (p = 0.19). 

Conclusion 

For Cyprus, participating in the IEA studies is of 
fundamental importance. Findings from these studies 
allow educational authorities to make cross-national 
comparisons of achievement, while the quality of the 
data enables in-depth analyses of the national results 
in an international context (Gonzales & Miles, 2001). 
This present article discussed important points of 
Cyprus’s mathematics education. The conceptual 
framework of the study described in this article was 
based on instructional practices applied in mathematics 
teaching, seen from the students’ perspectives, together 
with some background factors. The purpose of this 
study was to find the school indicators that differentiate 
more effective from less effective schools. For this 
reason, the analysis was based on the residuals, which 
present the differences between the actual mathematics 
scores and the predicted mathematics scores. 

Six factors were found to influence the more from 
the less effective schools: passive learning, active 
learning, self-perception, attitudes, family incentives, 
and class climate. In our analysis, we tried to provide 
insight into the characteristics of classrooms that are 
associated with differences in school effectiveness. Such 
knowledge is often regarded as a potential foundation 
for school improvement interventions. If we know the 
features of effective schools, we can improve the lower 
performing schools by encouraging them to adopt the 
characteristics of effective schools. 

The results of this research corroborate findings 
of other studies on school effectiveness. Differences 
were found between more effective and less effective 
schools, with the more effective schools exhibiting 
these characteristics: 

• Teaching is clear, well organized and keeps students 

actively involved; 

• Class climate is safe and orderly; 

• Students are stimulated by (receive incentives from) 



their families; 

• Students have positive attitudes toward mathematics; 
and 

• Students hold positive beliefs regarding their 
performance capabilities in mathematics. 

The contribution of this study is significant in 
that it was conducted in a country where all Grade 
8 students follow the same mathematics curriculum. 
Our analysis revealed, however, two distinctly different 
learning environments. The findings that passive and 
active learning, self-perception, attitudes, and class 
climate have substantial effects on differentiating 
schools, at least in terms of mathematics achievement, 
carry major implications for mathematics education 
because all these variables are amenable to change 
through instruction. If less effective schools are to 
be more effective, they need to take account of all 
these educational interventions, and to take into 
consideration all of the factors underlying mathematics 
achievement. 

Researchers suggest that students’ self-perceptions/ 
expectations are a major determinant of goal setting, 
and confirm that self-perceptions/beliefs can predict 
students’ mathematics performance (Bandura, 1997; 
Pajares & Graham, 1999). The positive relationship 
between attitudes and mathematics achievement 
is well documented (MacLean, 1995). The general 
relationship between attitudes and achievement is 
based on the concept that the more positive an attitude 
a student has toward a subject, the more likely it is 
that he or she will reach a high level of performance. 
Ma (1997) observed significant positive relationships 
between students who stated that mathematics was 
important and that they enjoyed the subject and 
their achievement in mathematics. Researchers have 
also found that parental stimulation is another factor 
characterizing effective schools across many countries 
(Guzel & Berberoglu, 2005), a finding confirmed in 
our study. 

The results of this analysis on school effectiveness 
contribute to a fuller understanding of the complicated 
issue of school improvement. However, the area 
of educational effectiveness still demands further 
theoretical and empirical research. Important issues 
that require further research are outcomes, inputs, and 
the learning process, and ideas on how we can promote 
an active learning environment in the classroom and 
in schools. 
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Abstract 

This paper considers change in the mathematics 
achievement of basic school students from 1995 to 
2003 in Lithuania. The analysis draws on data from 
TIMSS 1995, 1999, and 2003. The TIMSS cycles and 
the scaling methodology used for calculating the scores 
provide opportunity for participating countries not only 
to compare their results with results from other countries, 
but also to track the changes in their students’ achievement 
across the years. This facility is of particular importance 
to countries experiencing considerable changes in their 



education systems. Lithuania is one such country, as it 
has undergone considerable educational reform since the 
early 1990s. Participation in the aforementioned three 
TIMSS cycles provided Lithuania with a reliable means 
of measuring the impact of reform as it related to the 
mathematics achievement of students in Grade 8 of the 
basic school. The analysis described in this paper involved 
content analysis and classical statistical investigation. (The 
main statistical software used was SPSS 12.0.) 



Introduction 

Over the last century, educational reforms in various 
countries have become a part of the daily routine 
of educational institutions (Horne, 2001; Kinsler 
& Gamble, 2001). Many researchers examine the 
results of these reforms in various countries. Some 
praise the reforms (Draper, 2002; Gamoran, 1997); 
others say that the reforms have not had the desired 
results (Horne, 2001). When considering reforms 
in mathematics curricula, some researchers point to 
positive impacts. These include: 

• Students finding mathematics more interesting to 
learn when the subject is connected to real life-and- 
work contexts (Nicol, Tsai, & Gaskell, 2004); 

• Girls’ attitudes to and performance in mathematics 
improving when extra attention is given to teaching 
girls this subject (Richardson, Hammrich, & 
Livingston, 2003); 

• Students gaining a better understanding of algebra 
following changes to the content of mathematics 
lessons (Krebs, 2003); 

• Students upping their achievement scores in 
mathematics following a change from traditional 
lecture-type teaching methods to active and 
problem-solving methods (Sawada et ah, 2002). 

However, other researchers claim that, despite 

considerable efforts to reform the content of 



mathematics education, teaching methods, and 
instructional aids, these efforts often do not have the 
desired results (Vann, 1993). The desire to reform 
mathematical programs has come not only from the 
expectations that schools now have for higher student 
achievement in mathematics as a result of their 
respective country’s overall program of educational 
reform (Kelly & Lesh, 2000), but also, and usually 
because of, students’ generally low level of mathematics 
achievement (Betts & Costrell, 200 1 ; Frykholm, 2004; 
Hess, 2002). Thus, one of the main goals of the reform 
has been to improve students’ achievement in this 
subject, and the success of the reform has been, not 
surprisingly, frequently measured by the changes to 
students’ achievement scores in mathematics (Finnan, 
Schnepel, & Anderson, 2003). 

It is not easy, though, to measure improvement 
(Sawada et ah, 2002), especially over a short time 
period, as is usually wanted (Grissmer & Flanagan, 
2001). The reasons why vary. For example, the high 
standards expected of reformed mathematics programs 
force teachers to teach students only in the topics 
that will be tested and force students to cheat during 
tests (Hess, 2002). Although increases in student 
achievement may be observed soon after a reform 
has been put in place, this improvement tends to be 
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short-lived, with achievement frequently reverting to 
previous levels — and often in line with the normal 
distribution curve — after a few years (Vann, 1993). 

Many researchers claim that reforms do not produce 
any positive changes in the students’ mathematics 
achievements, and can even worsen their achievement 
(Alsup & Sprigler, 2003). They explain this in terms 
of reforms introduced too quickly and/or the reform 
tried in only a few schools, over a very short period. 
In these cases, the intended guidelines of the reform 
are not sufficiently well grounded. Sometimes the 
results expected from the reform are immeasurably 
high (Gordon, 2004) or the goals of the reform so 
unrealistic, such as “graduates who know mathematics 
better than graduates of all other countries in the 
world” (Hill & Harvey, n. d.) that measurement of 
changes in achievement becomes pointless. Also, it can 
be fruitless to try to improve students’ mathematics 
results because of incompatibility between the intent 
or theory of the reform and actual practice in the 
classroom. Moreover, teachers may be reluctant or 
not have the competence to embrace the ideas of the 
reform and to implement them (Finnan et ah, 2003; 
Kyriakides, 1997). 

The majority of researchers who say the reforms 
rarely lead to the expected improvements in 
achievement consider this is because the reforms 
focus only on the changes within the classroom and 
the school education environment. However, schools 
are not islands (Fullan, 2003), and the learning 
achievements of students strongly relate to their home 
socio-educational environments, as well as to other 
non-school environments (Cohen & Hill, 2000; Green, 
1987; Rotberg, 2000; Viadero, 1 996). These researchers 
also suggest that student achievement depends more 
on these non-school than in-school environments 
(Barton, 2001; Coleman, cited in Edmonds, 1987). 
As Barton (2001) points out, although the school 
itself naturally influences student achievement (see 
also Fullan, 1998), it is quite unrealistic to expect the 
school to have the sole influence. 

To recap, of the many researchers who have 
analyzed the influence of educational reform on 
students’ mathematics achievement, some see positive 
outcomes, but others think that, for various reasons, 
the reforms produce limited or no results in the long 
run. They stress that educational reforms tend to be 
associated with the schooling environment and fail to 



recognize that their effectiveness (in terms of student 
achievement) also depend on the students’ home socio- 
educational environments. Because every country has 
its own specific schooling and social environments, it is 
worth analyzing specific countries’ specific educational 
reforms and the results of those reforms. Lithuania, 
a country that has been implementing educational 
reform for some time, provides a case in point. 

After Lithuania claimed independence in 1991, 
radical changes in society made it necessary to make 
changes to the education system (Zelvys, 1999). These 
included rewritten study programs and educational 
standards, new textbooks, and modified teaching 
priorities and goals. A more modern stance began 
to inform teaching and learning. By drawing on the 
experience of other countries, Lithuania is endeavoring 
to form a national integrated system of schooling. It 
is also trying to move away, within the basic school, 
from an academic teaching approach to basic literacy, 
from an emphasis on reproduction of knowledge to 
development of skills, and from “dry” theory to more 
real-life situations. Teaching methods are changing: 
in addition to using the traditional lecture style of 
teaching, teachers are being encouraged to use various 
active teaching methods. In short, the prevailing model 
of a reproductive education system is being rejected, 
and an interpretative education system created. 

Research questions and methods 

The main research focus of this paper is to explore the 
extent to which changes in Lithuania’s educational 
school environment (in association with changes 
within political, social, and home spheres) appear to be 
reflected in students’ mathematics achievement. More 
specifically: Can we detect changes in mathematical 
literacy level while an educational reform is taking place l 
Are students achieving, on average, at a higher level 
in mathematics than they did at the beginning of the 
educational reform ? 

We can answer such questions not only by 
analyzing the present situation, but also by 
comparing it with the situation at the beginning of 
Lithuania’s political independence. The way to do 
this is to conduct a longitudinal study, during which 
researchers collect data on students’ mathematics 
achievement and data relating to factors that have a 
bearing on those achievements. The only research of 
this type conducted in Lithuania is TIMSS (Trends 
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in International Mathematics and Science Study), 
organized by the International Association for the 
Evaluation of Educational Achievement (IEA). 
Lithuania has participated in the three cycles of this 
research, conducted in 1995, 1999, and 2003. 

Lithuania’s continuing participation in TIMSS 
has thus provided the country with opportunity to 
evaluate the effectiveness of Lithuania’s educational 
development, to document the changes, and to identify 
possible general problems in education. In Lithuania, 
TIMSS has been the only study of educational 
achievement conducted consistently throughout the 
time of the educational reform. 

The information registered as a result of Lithuania’s 
participation in TIMSS related (for the 1995 cycle) 
to students who learned from mathematics textbooks, 
translated from the Russian and Estonian languages, 
and (for the 1999 and 2003 research cycles) to 
students who had studied from textbooks, written by 
Lithuanian authors. In Lithuania, only students in 
Grade 8 were tested during the three cycles. 

TIMSS uses the IRT (Item Response Theory) 
scaling methodology (in which the mean on the 
scale of student achievement scores is set at 500 and 
the standard deviation at 100). This allows each 
participating country not only to compare the average 
level of achievement for its students with the average 
levels of achievement in the other participating 
countries but also to compare the results for its own 
students across all three cycles of the study. 

The TIMSS results for Lithuanian students and 
the changes in these results over the three cycles have 
received minimal analysis. Zabulionis (1997a, 1997b, 
2001) and Trakas (1997) undertook some analysis 
of the 1995 results, and the several publications 
produced on TIMSS within Lithuania offer only a 
limited presentation, without analysis, of the results 
(Cekanavicius, Trakas, & Zabulionis, 1997; Dudaite, 
Elijio, Urbiene, & Zabulionis, 2004; Mackeviciute 
& Zabulionis, 2001). Dudaite (2006) edited a 
text presenting analyses of Lithuanian students’ 
mathematics results for the period 1995 to 2003. 

This present paper presents a further analysis of 
the changes in Lithuanian students’ mathematics 
achievement in the TIMSS assessment across the three 
cycles. In 1995, 2,547 Grade 8 students from Lithuania 
participated in the study; 2,361 students participated 



in 1999; and 5,737 students in 2003. The main goal 
of the analysis was to identify changes in Lithuanian 
Grade 8 students’ mathematics achievement results 
in TIMSS from 1995 to 2003 and to offer possible 
explanations for changes. The data for this work 
were drawn from the TIMSS 1995, 1999, and 2003 
databases, and the analyses involved content analysis 
and classical statistical investigations. The main 
statistical software used was SPSS 12.0. 

Review of Lithuanian students' achievement 
on the TIMSS mathematics tests 

Analysis of the TIMSS results showed a general 
improvement across the three cycles in the mathematics 
achievements of Lithuanian Grade 8 students. The 
students’ average score on the 1 999 scale was 1 0 points 
higher ( SE =6.1) than on the 1995 scale, but this 
difference was not statistically significant. However, the 
difference between the students’ average scores on the 
mathematics scale for TIMSS 1999 and TIMSS 2003 
was much higher (20 points, SE = 5.0) and statistically 
significant (Mullis, Martin, Gonzalez, & Chrostowski, 
2004). We can gain a clearer perspective on the import 
of this increase by comparing the Lithuanian results 
with the results of the other countries that participated 
in all three TIMSS cycles. 

From Figure 1, we can see that the increase across 
the three cycles in Lithuanian students’ achievement 
was higher than the increase for any other country. 
(The shaded bars on the right-hand side of the figure 
signify increases in average mathematics achievement 
and those on the left signify a decrease.) Latvia, which 
neighbors Lithuania, had a 17-point increase in 
achievement between 1995 and 1999, but the country 
had progressed no further by 2003. Russia, another 
country neighboring Lithuania, saw a decrease in 
achievement from 1995 to 2003. The highest decrease 
in mathematics achievement from the first TIMSS 
assessment in 1995 to the third assessment in 2003 
was in Bulgaria (51 scale points). 

Comparison of Lithuanian students’ average 
mathematics results with the international averages 
for each cycle (Figure 2) is also informative. Figure 
2 shows us that across the three cycles of the TIMSS 
study, the international average decreased from 500 
to 467 scale points, but that the Lithuanian average 
increased from 472 to 502 scale points. 
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Figure 1: Comparison of Average Mathematics Achievement of Students in the Countries that Participated in all 
Three TIMSS Cycles 




I ] Achievement differences between 
1995 and 1999 

In 1995, Lithuanian average achievement 
was significantly below the international average; 
Lithuania, in fact, appeared at the bottom of the 
country list. However, in 1999, Lithuanian average 
achievement was similar to the international average. 
In 2003, Lithuanian students proved themselves very 
successfully and outstripped the international average; 

Figure 2: Comparison of the Shift in International Average 
Achievement and Lithuanian Average Achievement of 
Grade 8 Students on the Three TIMSS Cycles 




International average — Lithuania 



^ Achievement differences between 
1995 and 2003 

the difference was marked. It needs to be acknowledged 
that the international average had strongly decreased 
by 2003, but this must also be seen in relation to the 
fact that the countries participating in each TIMSS 
cycle were not the same. Also, the comparison between 
the international average for TIMSS 1995 and the 
Lithuanian results for TIMSS 2003 showed that, 
by 2003, Lithuanian Grade 8 students had reached 
the international average of 1995. As such, it is fair 
to say that Lithuania outstripped the international 
benchmark not by 35 points but by only about two 
points. Another consideration is that, in 1995, the 
countries participating in TIMSS were nearly all from 
West European and Asian countries, which tended to 
have the higher achievement results. It was therefore 
particularly useful for Lithuania to have these countries 
as a comparison point because of the generally higher 
results across the participating countries. By 2003, the 
list of participating countries had greatly expanded 
and included many developing countries. 

Having considered the Lithuanian Grade 8 
students’ general achievement, let us now take a closer 
look at particular results. I would like to suggest that 
each single TIMSS item can be examined as if it were 
an international mathematics mini-contest. Therefore, 
it is interesting to observe how many times Lithuanian 
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students won or lost these competitions, and how 
these results changed over the years. The leaders of 
these contests undoubtedly were Asian countries: 
Singapore, Japan, Hong Kong SAR, South Korea, 
and Taiwan. A comparison of Lithuania with those 
countries that participated in the TIMSS study all 
three times shows that Lithuania in TIMSS 1995 was 
in the top five countries on only 1.9% of items (from 
155 items), and in the bottom five countries on 44.6% 
of items (from 155 items). TIMSS 2003 saw an almost 
two-fold improvement on the first result (4.1%, from 
194 items), and a four-fold decrease on the second 
(10.3%, from 194 items) (see Figure 3). With this 
latter result, Lithuania took the lead amongst the 
countries participating in the study all three times. 

Consideration of the Lithuanian students’ results 
for the various mathematics content areas showed 
that in 1995 the knowledge and the abilities of these 
students in five content areas were very different (see 
Figure 4). In 1995, the best-solved items were those 
relating to geometry (508 scale points), followed by 
those relating to algebra (488 scale points). The results 
of the other three mathematics content areas were 
much worse (measurement, 457 scale points; number, 
462; data, 465). By 1999, the differences between the 
students’ results in mathematics content areas had 
decreased slightly, with the best improvement evident 
for data (28 scale points). By 2003, the students’ levels 
of achievement in the different mathematics content 
areas had become very similar. Over the eight-year 
period, the results showing the least change were those 
for geometry and algebra, and those showing the most 



Figure 4: Changes in Lithuanian Students’ Results across 
the Three TIMSS Cycles by Mathematics Content Areas 




Number Algebra Measurement Geometry Data 



■ 2003 — ■— 1999 — 1995 

Note: The advanced benchmark was set at 625 or more points on the 
scale; the high at 550-624 points; the intermediate at 475-549 points; 
and the low at 400-474 points. 

(and significant) improvement were for number and 
data. 

Analysis of the Lithuanian students’ results against 
the international benchmarks also showed constant 
improvement across the three TIMSS cycles (see 
Figure 5). Fewer students were at the low benchmark 
in 2003 (10%) than in 1995 (19%). 

In summary, from 1995 to 2003, the average 
mathematics achievement of Lithuanian Grade 8 
students improved. Let us now consider possible 
explanations for that improvement. 



Figure 3: The Number of Times that Lithuanian Grade 8 Students’ Performance on TIMSS Mathematics Items 
Placed Them in the Five Top- and Five Bottom-performing Countries for Those Items 



Lithuania 



TIMSS 2003 



TIMSS 1999 



Share of items in 
TIMSS math test, on 
which Lithuania is 
in the group of five 
bottom countries 



10.3% 



Share of items in 
TIMSS math test, on 
which Lithuania is in 
the group of five top 
countries 



4.1% 
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Figure 5: Trends in Percentages of Lithuanian Students 
at the International Benchmarks for the Three TIMSS 




Advanced High Intermediate Low 

■ 2003 ■ 1999 n 1995 



Explanations for improvement 

It is clear that Lithuania’s educational reform influenced 
the improvement in Lithuanian students’ mathematics 
achievement from the time of the first to the third 
cycle of TIMSS. In particular, it seems fair to say 
that this improvement was an outcome of the newly 
established educational standards, the rewritten study 
programs, and the mathematics textbooks written in 
the “spirit” of TIMSS. After Lithuania participated in 
the TIMSS assessment for the first time and achieved 
very low results, educational reform (including school 
mathematics) was deflected more toward the style of the 
TIMSS items. This development signified recognition 
that one of the main objectives of the educational 
reform should be transformation from the conveyance 
of knowledge to the education of competence, from 
academic-style mathematics to mathematics literacy. 
Because TIMSS assesses students’ mathematics literacy, 
Lithuania’s participation in the study provided a good 
and appropriate impetus for this change. 

We can therefore partially explain Lithuania’s low 
level of achievement in the first TIMSS assessment by 
recognizing that in 1995 Lithuanian schools did not 
emphasize or teach mathematics literacy. Lithuanian 
students were used to a different type of mathematics, 
and therefore were not able to demonstrate their 
knowledge in TIMSS 1995. TIMSS 2003, executed 
after implementation of the educational reform, 
assessed students educated in contemporary Lithuanian 
schools. This argument alone is a solid one in explaining 



the marked improvement in the Lithuanian results in 
2003. 

Lithuanian mathematics study programs and the 
TIMSS frameworks 

I would now like to look at the differences and the 
similarities between the TIMSS research frameworks 
and Lithuanian mathematics study programs as well 
as the changes within them. In 1995 and in 1999, the 
structure of the TIMSS research had the following 
three dimensions (Robitaille, McKnight, Schmidt, 
Britton, Raizen, & Nicol, 1993): 

• Content: 

- Numbers 

- Measurement 

- Geometry 

- Proportionality 

- Functions, relations, equations 

- Data, probability, statistics 

- Elementary analysis 

- Validation and structure. 

• Performance Expectations: 

- Knowing 

- Using routine procedures 

- Investigating and problem-solving 

- Mathematical reasoning 

- Communicating. 

• Perspectives: 

- Attitudes 

- Careers 

- Participation 

- Increasing interest 

- Habits of mind. 

In 2003, the structure of the TIMSS research was 
somewhat changed; two structural dimensions were 
left, but they had been slightly amended (Mullis et al., 
2004): 

• Content Domains: 

- Number 

- Algebra 

- Measurement 

- Geometry 

- Data. 

• Cognitive Domains: 

- Knowing facts and procedures 

- Using concepts 

- Solving routine problems 

- Reasoning. 
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Analysis of the Lithuanian study programs shows 
considerable differences between those programs 
written before the reform and those written during 
it. The reformed programs contain new themes such 
as statistical elements, elements of probability theory, 
combinatorics, elements of economics, elements of 
computer science, and problem- solving (mathematical 
reasoning). The detailed themes of algebra, geometry, 
and number remain almost the same as they were 
before the reform (Dudaite, 2000; Lietuvos Respubliko 
svietimo ir mokslo ministerija, 1997a, 1997b; Lietuvos 
TSRsvietimo ministerija, 1988). 

Comparison of the mathematics content of 
the Lithuanian study programs with the TIMSS 
mathematics content shows the pre-reform Lithuanian 
mathematics study content differs most substantially 
from the TIMSS 1995 frameworks because the former 
did not contain data representation, probability, 
and statistics topics, as well as elementary analysis, 
validation, and structure. However, in relation to 
other mathematics content themes, analysis reveals 
no differences between the content of the TIMSS 
1995, 1999, 2003 frameworks and the pre-reform 
and the post-reform Lithuanian study programs for 
mathematics. Thus, we could assume the low results 
for Lithuanian students in TIMSS 1995 were because 
some of the TIMSS questions tested knowledge or 
skills that the Lithuanian students had not learned. 
However, when we take into account the number 
of TIMSS 1995 items that matched the content of 
Lithuanian pre-reform mathematics study programs, 
we get a high result — 95.7% (Beaton et al., 1996). 

If only 4.3% of the TIMSS 1995 items did not 
match the content of the Lithuanian mathematics 
study programs, this difference alone could not have 
provided the reason for the Lithuanian students’ low 
results. It is also important to note that the Lithuanian 
students’ results for the data representation, 
probability, and statistics domains (data) in TIMSS 
1995 were not their lowest domain scores (465 scale 
points; in comparison: number, 462; measurement, 
457). With all this in mind, we must conclude that the 
improvement in the Lithuanian students’ mathematics 
results across the three TIMSS cycles can be only 
partially explained by change to the content of the 
Lithuanian mathematics study programs. 



Lithuanian mathematics teaching goals and the 
TIMSS frameworks 

Another point of comparison is mathematics teaching 
goals before and after the reform, and it is here that 
the influence of changes on students’ achievement 
in mathematics is particularly evident. In 1988, 
before the educational reform, the main mathematics 
teaching goals were formulated as follows (Lietuvos 
TSRsvietimo ministerija, 1988): 

• To give knowledge 

• To form skills 

• To train logical thinking 

• To teach students how to use the knowledge in 
mathematics-related subjects 

• To prepare students in such a way that they could 
continue their studies. 

The teaching goals formulated during the 
educational reform in 1997 had a different tone 
(Lietuvos Respublikos svietimo ir mokslo ministerija, 
1997b): 

• To develop mathematical communication 

• To teach to solve standard mathematical procedures 

• To teach to solve mathematical problems and to 
investigate 

• To seek for mathematical reasoning 

• To train positive attitudes toward mathematics 

• To encourage mathematical, scientific, and 
technological careers 

• To promote the studying of mathematics 

• To form a mathematical, scientific thinking habit. 

In addition, the reformed study programs of 1 997 
stated the main purpose of mathematics teaching 
to be that of guaranteeing mathematical literacy for 
all members of society. One main point, and a very 
important one, regarding the changes in wording in 
the programs relates to the appearance of the notion 
of mathematical literacy. Pre-reform, schools taught a 
more academic style of mathematics. Mathematical 
literacy was not something to aim for. 

Comparison of the goals of mathematics teaching 
formulated before and during the reform with the 
content of the TIMSS frameworks (Robitaille et al., 
1993) shows equivalency between the 1997 -formulated 
goals and the following structural dimensions of the 
TIMSS 1995 and 1999 frameworks: “performance 
expectations” (all parts except the first one, “knowing”) 
and “perspectives” (all parts). The goals of mathematics 
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teaching formulated before the reform were equivalent 
only for the first two parts of the TIMSS 1995 
framework’s dimension “performance expectations” 
(i.e., “knowing” and “using routine procedures”). 
Thus, the Lithuanian mathematics teaching goals 
articulated during the time of the reform (in 1 997) are 
in essence equivalent to the TIMSS research format, 
but the same cannot be said of the goals set down 
before the reform. This means the Lithuanian students 
participating in TIMSS 1999 and 2003 had received 
mathematics education that accorded with the TIMSS 
“spirit,” a factor that explains, to a good degree, the 
significant improvement in the Lithuanian students’ 
mathematics results by 2003. 

Lithuanian mathematics textbooks and the TIMSS 
frameworks 

While the Lithuanian study programs and educational 
standards set down the goals of mathematics teaching, 
mathematics content areas, and detailed topics, they 
do not indicate how much time schools should spend 
on each topic. However, it is possible to establish 
approximate times through analysis of the mathematics 
textbooks. 

The students who participated in TIMSS 1995 
studied, during their Grades 5 and 6 years, the following 
textbooks, which were translated into Lithuanian from 
the Estonian language: 

• Nurkas, E., & Telgma, A. (1990). Matematika: 
Vadovelis V klasei [Mathematics: Textbook for Grade 

5] . Kaunas: Sviesa 

• Nurkas, E., & Telgma, A. (1991). Matematika: 
Vadovelis VI klasei [Mathematics: Textbook for Grade 

6] . Kaunas: Sviesa. 

In Grades 7 and 8, these students studied from 
textbooks from the Russian language: 

• Teliakovskis, S. (1991). Algebra: Vadovelis VII klasei 
[Algebra: Textbook for Grade 7]. Kaunas: Sviesa 

• Teliakovskis, S. (1990). Algebra: Vadovelis VIII- IX 
klasei [Algebra: Textbook for Grades 8—9]. Kaunas: 
Sviesa 

• Atanasianas L., et al. (1991). Geometrija: Vadovelis 
VII— IX klasei [Geometry: Textbook for Grades 7—9]. 
Kaunas: Sviesa. 

According to the reformed mathematics study 
programs and educational standards, the students who 
participated in TIMSS 1999 and 2003 studied from 
textbooks written by Lithuanian authors: 



• Strickiene, M., & Cibulskaite, N. (1996). 

Matematika 5. Vilnius: TEV 

• Strickiene, M., & Cibulskaite, N. (1996). 

Matematika 6. Vilnius: TEV 

• Cibulskaite, N. et al. ( 1 998). Matematika 7. Vilnius: 
TEV 

• Cibulskaite, N. et al. (1998). Matematika 8. Vilnius: 
TEV. 

Analysis of the Lithuanian mathematics textbooks 
reveals that the topics of algebra and geometry receive 
less attention than they did in the earlier texts but that 
number and measurement receive more (Zybartas, 
1999). The new texts also include new topics: statistics, 
probability theory, combinatorics, and mathematical 
reasoning. These features explain why the results for 
Lithuanian students in algebra and geometry changed 
little over the three TIMSS cycles, and why the areas of 
greatest improvement were number and data (statistics 
and probability). 

Students’ socio-educational home factors 

We cannot consider changes in Lithuanian students’ 
mathematics results without considering societal 
factors, such as changes in the students’ economic and 
educational home environments. The results of many 
studies show a strong relationship between students’ 
home socio-educational environment and their 
mathematics achievement. 

Let us therefore form a home socio-educational 
environment factor. Because the indicators that need 
to be taken into account must be in all three TIMSS 
cycles, the possible indicators are these: mother’s and 
father’s highest educational qualification, number of 
books at home, owning an encyclopedia, a dictionary, 
and a calculator, and having a work-table at home. 
Now let us take these possible indicators and use them 
to form a home socio-educational environment factor 
SES (Cronbach alpha: TIMSS 2003, 0.631; TIMSS 
1999, 0.557;TIMSS 1995, 0.383). Regression analysis 
showed a strong relationship between Lithuanian 
students’ mathematics results and their SES (see Figure 
6). From this figure, we can see that students from the 
same home socio-educational environments gained 
more mathematics points with each TIMSS cycle. 

It is interesting to observe the extent to which the 
students’ actual home socio-educational environment 
changed over the eight-year period. We can do this by 
forming an index with the previously used indicators: 
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highest parental education, number of books at home, 
owning a calculator, an encyclopedia, and a dictionary, 
and having a work-table. Here, highest education 
of parents is categorized as follows: lower than or 
equivalent to ISCED 3; equivalent to ISCED 4; and 
equivalent to or higher than ISCED 5. Figure 7 
shows students’ home socio-educational environment 
worsened overtheeightyears. Consequently, despite the 
strong relationship between the Lithuanian students’ 
mathematics results and their home backgrounds, SES 
does not explain the improvement in the students’ 
performance across the three TIMSS cycles. 



Figure 6: Relationship between Lithuanian Students’ 
Home Socio-educational Environment and Their 
Mathematics Results across the Three TIMSS Cycles 




1995 1999 2003 



TIMSS 


B 


Bi 


Sig. 


1995 


470.677 


31.525 


0.000 


1999 


481.157 


35.597 


0.000 


2003 


504.675 


31.253 


0.000 



Students’ attitudes toward mathematics 

Another possible explanation for the large improvement 
in Lithuanian students’ mathematics achievement 
relates to students’ attitudes toward mathematics as a 
subject. The correlation between student achievement 
and attitudes toward mathematics (measured by the 
statement, “I like mathematics X much”) in TIMSS 
1995 was 0.230, in TIMSS 1999, 0.288, and in TIMSS 
2003, 0.239. Figure 8 shows, for Lithuanian students, 
a clear improvement in attitude toward mathematics 



Figure 7: Changes in Lithuanian Students Home Socio- 
educational Environment 
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Figure 8: Relationship between Lithuanian Students’ 
Attitudes toward Mathematics and Their Mathematics 
Results across the Three TIMSS Cycles 
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between 1995 and 1999, but a worsening of attitude 
between 1999 and 2003, despite the improvement in 
achievement over this latter period. The improvement 
in attitude in the former period was probably due to 
the introduction of the new mathematics textbooks in 
1996, which differed substantially from the previous 
textbooks in terms of design and variety and number 
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of exercises. However, it seems that once the novelty of 
the new books wore off, mathematics again became a 
less interesting subject for many students. Nonetheless, 
it is likely that the higher test results of 2003 reflected 
the stronger interest in mathematics in 1999. 

Format of test questions 

Another possible reason for the Lithuanian students’ 
improved achievement was greater familiarity with the 
multiple-choice format of the test items. Most of the 
Lithuanian students who participated inTIMSS 1995 
had not encountered this format because it was rarely, 
if ever, used in the pre-reform mathematics textbooks. 
Analysis of omitted (not solved) TIMSS items 
illustrates this. Table 1 shows the average percentages 
of students who omitted items without solving them 
in the TIMSS 1995 and 2003 assessments. Students 
omitted the multiple-choice answer format items twice 
as often in 1995 as in 2003, but there was no significant 
difference between the two dates in the percentages of 
students omitting open-response items. 

We get a similar result with the trend items of 
TIMSS 1995 and 2003 (see Table 2). Once again, it 
can be seen that the students omitted the multiple- 
choice items twice as often in 1995 as they did in 
2003. A comparison showed little difference in the 
difficulty of the items. 

To verify that the Lithuanian students presumably 
were more proficient in 2003 at answering questions 
in the multiple-choice format, we can look at the 

Table 1: Percentage of Multiple-choice (MC) and 
Open-response ( OR) Test Items Omitted by Lithuanian 



Students in 


TIMSS 1995 and 2003 




TIMSS 


MC omitted (%) 


OR omitted (%) 


1995 


7.50 (SE = 0.55) 


20.52 (SE= 1.84) 


2003 


3.60 (SE = 0.24) 


23.38 (SE= 1.60) 



extent to which students in 1995 and students in 2003 
omitted the most difficult and the easiest items. As 
Table 3 shows, Lithuanian students in 2003 were twice 
as likely in 1995 to omit these questions as they were 
in 2003. Thus, by 2003, many Lithuanian students 
were able to apply the strategies needed to answer test 
items in multiple-choice format. 

Conclusions 

The key findings of this paper are as follows: 

1. From 1995 to 2003, the average mathematics 
achievement of Lithuanian Grade 8 students 
improved significantly. 

2. Of the countries that participated in all three 
TIMSS cycles, Lithuania improved the most 
in terms of its Grade 8 students’ mathematics 
achievement. 

3. In 2003, Lithuanian Grade 8 students’ achievement 
across the different mathematics content domains 
was more homogenous than it was in 1995. 

4. From 1995 to 2003, the mathematics content 
domains in which Lithuanian Grade 8 students 
improved most were data and number. The least 
amount of change was evident for algebra and 
geometry. 

5. In relation to solving particular test items, the 
performance of Lithuanian Grade 8 students 
placed Lithuania amongst the five top countries 
twice as often in 2003 as in 1995. The students’ 



Table 2: Percentage of Multiple-choice Trend Items 
Omitted by Lithuanian Students in TIMSS 1995 and 
2003 



TIMSS 


MC omitted (%) 


Item difficulty (average) 


1995 


6.83 (SE = 0.84) 


57.12 (SE= 3.48) 


2003 


3.1 8 (SE = 0.37) 


61 .45 (SE= 3.09) 



Table 3: Differences between Lithuanian Students’ Omission of 10 Most Difficult and 10 Easiest TIMSS 1995 and 
2003 Items (Multiple-choice Only) 



TIMSS 


10 most difficult items 


1 0 easiest items 




Omitted (%) 


Item difficulty 


Omitted (%) 


Item difficulty 


1995 


12.31 (SE= 3.59) 


24.03 (SE= 1.96) 


1 .48 (SE= 0.49) 


86.55 (SEs. 1.1 6) 


2003 


5.77 (SE= 0.73) 


24.36 (SE= 2.18) 


0.81 (SE= 0.10) 


85.65 (SE= 1.38) 
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performance placed Lithuania in the list of the 
five bottom countries four times less often in 2003 
than in 1995. 

6. The number of Lithuanian Grade 8 students below 
the low international benchmark for mathematics 
achievement decreased between 1995 and 2003. 

7. The improvement in Lithuanian students’ 
mathematics achievement over the period 1995 
to 2003 is best explained by educational reform 
encompassing revised mathematics study programs, 
educational standards, and mathematics textbooks 
written in the TIMSS “spirit.” 
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Abstract 

This paper considers the influence of the stem format and 
answer format of survey items on the responses (results) for 
those items. The items referred to in this analysis are from 
TIMSS 2003, conducted by the International Association 
for the Evaluation of Educational Achievement (IEA). The 
analysis focuses on the national test booklets completed by 
806 Grade 8 students (from 150 schools) in Lithuania. In 
TIMSS 2003, Lithuania added two national booklets for 
mathematics, in which the TIMSS items were changed in 
a way that made it possible to examine the effect of the stem 
and answer format of an item on the results for that item. 
The analysis showed that Lithuanian Grade 8 students 
were significantly better at solving multiple-choice answer 



format items than open-ended answer format items. They 
also were less likely to omit multiple-choice than open- 
ended items. No difference was found in these regards 
between the genders. The analysis also showed that the 
distracters given in multiple-choice answer format items 
influenced students’ choice of answer. When students 
were presented with exactly the same items but without 
the answers, considerably fewer of them gave the answers 
given in the distracters. The hypothesis that formulating 
the stems of the TIMSS items more in “Lithuanian style” 
would help students solve the items was discounted. 
However, in general, the wording of the stem item did 
affect the answers to (the results of) the item. 



Introduction 

When educational research involving achievement 
tests is conducted, items offering multiple-choice 
answers and items requiring open-ended answers are 
used. There is constant debate among researchers on 
what proportion of a test should include multiple- 
choice items and what proportion should be open- 
ended. Each type of item has its benefits and its faults. 
With multiple-choice, more material can be covered 
and in a shorter time than is possible with open-ended. 
Multiple-choice items guarantee an easy, absolutely 
objective and cheaper way of marking. If these items 
are well made, they correlate well with the results 
received from solving other types of items, they allow 
easy identification of standard mistakes, and they are 
especially appropriate when the choices of possible 
answers can be clearly determined. On the negative 
side, it is difficult to choose appropriate distracters for 
these items, and it is difficult to examine higher levels 
of ability (problem-raising, argumentation, etc.). Also, 
these items can be easily “copied,” and some answers 
may be guesses. With open-ended items (including 
short-answer, short-solution, structured), there is no 
need to choose appropriate distracters or to consider 
that answers might be guesses. In addition, these items 



are suitable for evaluating various levels of ability. 
However, solving open-ended items takes a lot of 
time, which means tests made up of these items may 
not cover as many topics as tests made up of multiple- 
choice items. Open-ended questions are more difficult 
to evaluate than multiple-choice, and involve greater 
subjectivity and expense during marking. Often, 
with these items, the most typical mistakes are not 
identified. 

Because these different answer types have their pros 
and cons in terms of developing and administering 
tests, it is important to know what impact the two types 
have on the answers to (results for) each type. To what 
extent does the answer format influence the difficulty 
of the item for students and to what extent does it 
influence students not answering (omitting) that item? 
Efforts to answer these questions have produced many 
studies, but the findings of these differ. Answer format 
appears to have a different effect on different teaching 
subjects as well as at different grade levels. 

According to Elley and Mangubhai (1992), answer 
format does nothaveastatisticallysignificant differential 
influence on the results for items in large-scale tests of 
reading ability. Hastedt’s (2004) analysis of the tests 
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for the international reading ability study PIRLS 200 1 
shows the opposite — that answer format does affect 
the results for the given items. Hastedt found that, 
on average, students are better able to solve multiple- 
choice items than open-ended items. The difference 
was statistically significant. Abrahamson (2000) 
concluded that the different answer formats of items in 
physics tests had no influence on the results for those 
items. Nasser (n. d.) claims that in tests of statistical 
literacy, students are worse at answering multiple- 
choice items than open-ended items. Gadalla (1999), 
considering the field of mathematics computation, 
found no statistically significant differences in the 
results for the different answer formats of items in tests 
given to Grades 4, 5, and 6 students. However, Grades 
2 and 3 students answered the multiple-choice items 
statistically significantly better than they answered the 
open-ended items. Traub (1993) found the different 
answer formats had no effect on the results for tests 
relating to the quantitative domain. Zabulionis ( 1 997) 
analyzed theTIMSS 1995 tests completed by Grades 
7 and 8 students from Eastern European countries. He 
stated that in some of these countries, students were 
more likely to omit a multiple-choice format item than 
an open-ended format item, but that in the majority 
of the countries, answer format had no bearing on 
whether or not students omitted the item. 

Because the results of these various studies differ, 
it is useful to add to the debate by analyzing how the 
answer format of the test items, as well as the stem 
format of the items, influenced the results for these 
items in the mathematics component ofTIMSS 2003. 
The results discussed in this paper relate to those for 
the Grade 8 students from Lithuania who participated 
in TIMSS 2003. 

Method 

The analysis focused on the TIMSS 2003 Grade 8 
national test booklets for Lithuania, answered by 806 
students from 150 schools. In TIMSS 2003, Lithuania 
added two national booklets for mathematics, in which 
the TIMSS items were changed in a way that allowed 
us to do the following: 

1 . Verify the effect of the answer format of the item on 
the item results by determining: 

1.1. Whether the results for the multiple-choice 
answer format items differed from the results 
for the open-ended answer format items; 



1.2. How the distracters of multiple-choice answer 
format items influenced the answers to this 
type of item. 

2. Verify the effect of the stem format of the item on 
the item results by determining: 

2.1. Whether students were better able to solve 
TIMSS items rephrased in the “Lithuanian 
style” than in the original style (by “Lithuanian 
style,” we mean the formulations more 
commonly used in Lithuanian mathematics 
textbooks); 

2.2. Whether the students’ responses to theTIMSS 
items differed when the phrasing of the item 
stem was changed in a certain way. 

The changes encompassed the following: 

1. The TIMSS multiple-choice answer format items 
were changed to open-ended answer format, and 
vice versa. 

2. The TIMSS items were rephrased in these ways: 

2.1. The stems were rephrased in “Lithuanian 
style;” 

2.2. The stems were rephrased in other different 
ways; 

2.3. The fractions in the stems were written in 
words, and vice versa. 

3. A few extra “TIMSS-style” items were created so 
that the above-mentioned ideas could be verified. 

4. A number of TIMSS items remained unchanged 
so that we could check if the achievements of the 
students who answered the national booklets were 
identical to the achievements of the students who 
answered the TIMSS booklets. 

In line with the main goal of the analysis in this 

paper, that is, verifying the effect of the answer format 

and the stem format of items on item results, several 

hypotheses were formulated: 

1. Students solve items in multiple-choice answer 
format better than items in open-ended answer 
format. 

2. Students are more likely to omit items that require 
open-ended answers than they are to omit items 
written in multiple-choice answer format. 

3. The distracters of the multiple-choice items influence 
choice of answer (when solving items, students are 
less likely to get the certain wrong answer to the 
questions if the questions have an open-ended 
answer format then they are if they have to select 
from the options offered under the multiple-choice 
format); 
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4. The phrasing of the stem of the item influences 
students’ ability to answer the item correctly. 

5. Students are more likely to answer items written in 
“Lithuanian style” than items written in standard 
TIMSS style. 

6. Students are less likely to correctly answer items 
where fractions in the item stem are written in 
words. 

Results 

Grade 8 students in Lithuania were statistically 
significantly better at solving multiple-choice answer 
format items than they were at answering open-ended 
answer format items, and they were less likely to omit 
the former than the latter (see Figure 1). Forty- two 
items were analyzed. As we can see from the figure, 
the difficulty and omitting curves of the items with 
the multiple-choice answer format cover the curves of 
the items with the open-ended answer format. Thus, 
if required to answer an item presented in a multiple- 
choice answer format, students, on average, were better 
able to solve it and less likely to omit it than they were 
if the same item was presented in an open-response 
answer format. Figure 2 presents the average difference 
between a different answer format difficulty and the 
omitting of an item. There was no difference between 
the genders in solving the different answer format 
items, both in terms of the difficulty of the item and 
the omitting of the item. 



Figure 2: Average Percentages of Lithuanian Grade 8 
Students Who Correctly Answered and Who Omitted 
Test Items Written in Multiple-choice Format and Those 
Written in Open-ended Format 




— ♦ — Diffculty — ■ — Omitted 

With items in multiple-choice answer format, the 
given distracters influenced students’ choice of answer. 
When students were given exactly the same items 
without the multiple-choice answers, considerably 
fewer of them provided the answers given in the 
distracters in the multiple-choice version. This finding 
suggests that the distracters, which are considered 
typical mistakes, need to be evaluated with much 
care. 

Figure 3 presents the results for one such item. (Chi 
square analysis shows that the answers students gave to 
an item depended on its answer format; X 2 = 481.368, 
df = 6, p = 0.000.) We can see from the figure that 
with the multiple-choice case, most students chose the 



Figure 1: Percentages of Lithuanian Grade 8 Students Who Correctly Answered and Who Omitted Test Items Written 
in Multiple-choice Format and Those Written in Open-ended Format 




-* — MC diffculty — ■ — MC omitted — • — OE diffculty OE omitted 
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wrong answers, C and A. Their choice of distracter 
A suggests they “forgot” that an hour consists of 60 
minutes, not 100. This mistake is a common one in 
basic school. Those students who choose distracter C 
were perhaps misled by the “similarity” of 20 minutes 
and 1/2. With the open-ended case, we can see that 
almost nobody gave the wrong answers 1/5 and 1/2 
(given in distracters A and C in the multiple-choice 
format case), but that many of the students gave answer 
3/4 (which was the distracter E). These results indicate 
the different typical mistakes that students tend to 
make when answering the same question presented 
with different answer formats. 

The wording of the stem of an item also influences 
the answers students give to that item. However, it is 
difficult to determine what change in the stem might 
make the item easier or more difficult for students 
to answer correctly. For example, when the stem was 
taken away from one item (“calculate the expression 
and write down the answer in decimals”) and only the 
expression left, which needs to be calculated (add two 
simple fractions), 20% more students were unable to 
solve the item than was the case when the item included 
the worded stem by the expression. But in another 
identical item (to subtract two simple fractions), 20% 
more students were able to solve the item after removal 
of the worded stem. 



Figure 3: Difference in Students Answers for the Same 
Question Presented in Multiple-choice and Open-ended 
Answer Formats 



What fraction of an hour 
has passed between 1:10 
am. and 1 :30 am.? 


MC 


OE 


A 


12,6 % 


1.8% 


B 


50.0% 


39 3 % 




17.5% 


1.3% 


1 


10.1% 


0.3 % 


3 


3.1 % 


2K5 % 




- 


23.5 % 




5.8% 


5.5 % 



Rephrasing some of the TIMSS items in different 
ways produced various results. For example, students 
were less able to answer an item correctly when the 
unknown quantity (x) rather than the number was 
written. Similarly, students were better able to solve 
an item when minutes instead of the part of an hour 
were written. 

By making the stems of TIMSS items more 
Fithuanian-like, we hoped that the more familiar 
wording of the stem would help more students solve 
the item. However, while students were better able to 
solve some of these rephrased items (for an example, 
see Table 1), they were less able to solve others correctly 
(see Table 2), and for some items, students were just as 
likely to answer the Fithuanian-styled item incorrectly 
as they were the originally worded item. Although 
rewording the item stems brought the familiarity of 
Fithuanian style to the students, it is possible that, 
with some items, rewording produced a longer and 
more complicated stem, which may have influenced 
students’ ability to answer the item correctly. 

Rephrasing the item stem by writing down the 
fraction in words was the change that most often altered 
the difficulty of the item. In most cases, students were 
less able to solve items worded in this way, in some 
cases students were just as likely to solve the originally 
worded as the rephrased items, and, in a very few 
cases, students were better able to solve the item when 
it was reworded. Figure 4 provides an example in 
which students were considerably less able to solve the 
item when the fraction (1/4) was written in words (a 
quarter) (the answers depend on the item format: X 2 = 
102.358, df= 5 ,p = 0.000). Here, we can see that with 
the item in the left-hand column, the main mistake 
was distracter A, which meant that when doing their 
calculation, students used just two fractions — 1/2 and 
1/5 (i.e., those written like fractions). These students 
did not recognize “a quarter” as a number. Almost 1 0% 
of the students chose distracter B, again indicating 
that students did not use “a quarter” when doing their 
calculation. In the case in the right-hand column, we 
can see that fewer students chose distracters A and B. 
Here, the main mistake was distracter C, which clearly 
shows that the students were using all three fractions 
to do their calculation. 
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Table 1: Example of a TIMSS Item that Students Were Better Able to Solve after It Had Been Reworded in 
“Lithuanian Style” 



If 4(x+5)=80, then x= 


TIMSS stem 


Solve the equation: 4(x+5)=80 


Lithuanian stem 


True 


50.9% 


True 


59.3% 


False 


30.9% 


False 


27.3% 


Omitted 


16.6% 


Omitted 


10.1% 


Note: x 2 = 1 9.521 , df = 2, p 


= 0 . 000 . 






Table 2: Example of a 
“Lithuanian Style” 


TIMSS Item that Students Were Less Able to Solve after It Had Been Reworded in 


Which of these is 370-998+370-2? TIMSS stem 


Carry out the number before the parenthesis: 
370-998+370-2. Which expression will you 
get after the calculation? 


Lithuanian stem 


A 370-1000 


47.5% 


A 370-1000 


35.9% 


B 372-998 


4.5% 


B 372-998 


5.2% 


C 740-998 


17.1% 


C 740-9998 


12.0% 


D 370-998-2 


28.2% 


D 370-998-2 


37.0% 


Omitted 


2.4% 


Omitted 


3.7% 



Note: x 2 = 1 9.521 , df = 2, p = 0.000. . 



Conclusions 

1. The difficulty of the open-ended answer format 
items was higher than that of the multiple-choice 
answer format items. 

2. Students more frequently omitted the open-ended 
answer format items than the multiple-choice 
answer format items. 

3. The distracters of the multiple-choice items 
influenced the answers that students gave. When 
answering open-ended answer format items, 
students were less likely to get the certain wrong 
answer than they were when having to choose from 
the possible answers listed for a multiple-choice 
question. 



4. The wording of an item’s stem statement influenced 
students’ ability to solve the item. 

5. Students were no more likely to correctly answer 
items written in “Lithuanian style” than they were to 
correctly answer items written in original “TIMSS 
style.” 

6. Students were less likely to answer items successfully 
when the item-stem used fractions written in 
words. 
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Introduction 

This paper examines the differences among and within 
the Australian States in science teaching and learning 
based on the analysis of data from TIMSS. It focuses 
on science achievement at Grade 8 in 2002. The paper 
begins with a consideration of the differences among 
states in science achievement at Grade 4 and Grade 8 
and the way in which patterns changed between 1 994 
and 2002. It then examines the influence of factors 
operating at state, school, and student levels on science 
achievement at Grade 8 in the national picture and the 
way those influences differ among states. It concludes 
with a discussion of the factors influencing Grade 8 
science achievement. 

Context 

Australia’s national goals for schooling assert that 
by the time students leave school they should have 
attained high standards of knowledge, skills, and 
understanding in eight key learning areas: the arts; 
English; health and physical education; languages other 
than English; mathematics; science; studies of society 
and environment; and technology. The Performance 
Measurement and Reporting Taskforce (PMRT) 1 
administers the National Assessment Programme 
(NAP). This defines key performance measures and 
monitors progress toward the achievement of the 
national goals (MCEETYA, 2005). 

The Trends in International Mathematics and 
Science Studies (TIMSS) is defined as part of the 
NAP. For each cycle of TIMSS, extensive national 
reports are produced that detail the pattern of results 
for Australia (Thomson & Fleming, 2004a, 2004b; 
Lokan, Elollingsworth, & Elackling, 2006; Lokan, 
Ford, & Greenwood, 1996, 1997). In addition, the 
NAP incorporates annual assessments of literacy and 
numeracy that use the full population of students 
at Grades 3, 5, and 7. Assessments for civics and 
citizenship and for ICT literacy are conducted every 



three years for sample surveys of students in Year 6 and 
Year 10. For science, there is a sample survey at Grade 
6 every three years. 

Australia has a federal system of government, with 
states having the major responsibility for education. 
There are differences among states in educational 
organization and curriculum in many fields, including 
science education, and there is increasing interest in 
examining the differences among states in fields such 
as science and mathematics. There are also differences 
among the states in the age of students at any given 
grade and in the demographic characteristics of the 
population. Table 1 contains an indication of some of 
these variations. 

Boosting science learning has become a priority 
of the federal government. A national review of the 
quality and status of science education in Australian 
schools concluded that there was a gap between the 
ideal and reality, especially in secondary schools and 
particularly in relation to the teaching of science as 
scientific literacy (Goodrum, Elackling, & Rennie, 
2001). The TIMSS 1999 Video Study reported 
that Australian lessons were characterized by a core 
pedagogical approach that involved analyzing data 
gathered through independent practical activity and 
focusing on connections between ideas and real-life 
experiences (Lokan, Ford, & Greenwood, 2006). 

There is no common school curriculum in science 
across the country, although there is a non-mandatory 
national statement of learning in science that outlines 
the learning opportunities that should be provided 
at each stage of schooling from Grade 1 to Grade 
10 (Curriculum Corporation, 2006). A national 
online science assessment resource-bank (SEAR) has 
been developed for use by schools to support science 
teaching. Within states, the pattern is that central 
authorities specify broad curriculum frameworks 
and schools have considerable autonomy in 
deciding curriculum detail, textbooks, and teaching 



1 Established by the Ministerial Council for Education, Employment, Training, and Youth Affairs (MCEETYA). 
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